Working with Jobs

Working with Jobs#

Use this guide as a reference to programmatically build, run and inspect CryoSPARC jobs with cryosparc-tools. The following capabilities are covered:

Creating jobs
Setting job parameters
Connecting job inputs and outputs
Queuing and running jobs
Inspecting job outputs, files and assets

Note

If you have never worked with CryoSPARC jobs and results before, first read the Creating and Running Jobs page in the CryoSPARC Guide.

To get started, first initialize the CryoSPARC client from the cryosparc.tools module.

from cryosparc.tools import CryoSPARC

cs = CryoSPARC(host="cryoem0.sbi", base_port=40000)
assert cs.test_connection()

Connection succeeded to CryoSPARC command_core at http://cryoem0.sbi:40002
Connection succeeded to CryoSPARC command_vis at http://cryoem0.sbi:40003
Connection succeeded to CryoSPARC command_rtp at http://cryoem0.sbi:40005

Creating Jobs#

In CryoSPARC, cryo-EM data is processed by Jobs such as Import Movies and Ab-Initio Reconstruction. Jobs can import data from disk, process it and output Results that may be connected to other jobs.

Each job has an associated machine-readable type key that must be specified to create it. Show a table of jobs types available to create with the CryoSPARC.print_job_types() function. Optionally specify a section argument to only show job types from a specific section.

For example, to list available extraction and refinement job types:

cs.print_job_types(section=["extraction", "refinement"])

Section    | Job                              | Title                            
=================================================================================
extraction | extract_micrographs_multi        | Extract From Micrographs (GPU)   
           | extract_micrographs_cpu_parallel | Extract From Micrographs (CPU)   
           | downsample_particles             | Downsample Particles             
           | restack_particles                | Restack Particles                
refinement | homo_refine_new                  | Homogeneous Refinement           
           | hetero_refine                    | Heterogeneous Refinement         
           | nonuniform_refine_new            | Non-uniform Refinement           
           | homo_reconstruct                 | Homogeneous Reconstruction Only  
           | hetero_reconstruct               | Heterogeneous Reconstruction Only

This information is also available as a Python list with job.get_job_specs().

Use the CryoSPARC.find_project() function to load a project to work in:

project = cs.find_project("P251")

Create a new job with the project.create_job() function. Specify a workspace UID and a job type (such as one of the types listed above):

job = project.create_job("W1", "extract_micrographs_cpu_parallel")
job.uid, job.status

('J96', 'building')

Note the UID of the new job in the given workspace.

You may also use project.find_job() to load a job that was manually-created in the CryoSPARC interface:

job = project.find_job("J96")
job.uid, job.type, job.status

('J96', 'extract_micrographs_cpu_parallel', 'building')

Setting Parameters#

A newly-created job has status building. You may change parameters and connect outputs while a job is in this mode.

Use job.print_param_spec() to show a table of available parameters. The first column lists the machine-readable parameter name that may be used to assign this value:

job.print_param_spec()

Param                 | Title                                  | Type    | Default
==================================================================================
bin_size_pix          | Fourier-crop to box size (pix)         | number  | None   
bin_size_pix_small    | Second (small) F-crop box size (pix)   | number  | None   
box_size_pix          | Extraction box size (pix)              | number  | 256    
compute_num_cores     | Number of CPU cores                    | number  | 4      
flip_x                | Flip mic. in x before extract?         | boolean | False  
flip_y                | Flip mic. in y before extract?         | boolean | False  
force_reextract_CTF   | Force re-extract CTFs from micrographs | boolean | False  
num_extract           | Number of mics to extract              | number  | None   
output_f16            | Save results in 16-bit floating point  | boolean | False  
recenter_using_shifts | Recenter using aligned shifts          | boolean | True   
scale_const_override  | Scale constant (override)              | number  | None   

This information is also available as a Python dictionary from job.doc.params_base.

Based on the title, you may use the CryoSPARC web interface to browse detailed descriptions on these parameters.

Use job.set_param() to update a parameter. Returns True if the parameter was successfully updated:

job.set_param("box_size_pix", 448)
job.set_param("recenter_using_shifts", False)

True

Connecting Inputs and Outputs#

Most jobs also require inputs:

An Input is a connection to another parent job’s cryo-EM data Output
- e.g., a list of micrographs, picked particles or a reconstructed volume
An Output is a group of low-level results produced when a job finishes running
Each Result includes various data and metadata about its parent output
- e.g., motion correction information, computed CTF or particle blobs
Inputs have Slots that each correspond to an output result

In the CryoSPARC web interface, you may inspect the available inputs and outputs from a job’s “Inputs and Parameters” tab and “Outputs” tab, respectively.

With cryosparc-tools, use job.print_input_spec() to show a table of available input requirements for a given job.

job.print_input_spec()

Input       | Title       | Type     | Required? | Input Slots     | Slot Types      | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+)    | micrograph_blob | micrograph_blob | ✓             
            |             |          |           | background_blob | stat_blob       | ✕             
            |             |          |           | mscope_params   | mscope_params   | ✓             
            |             |          |           | ctf             | ctf             | ✕             
particles   | Particles   | particle | ✕ (0+)    | location        | location        | ✓             
            |             |          |           | alignments2D    | alignments2D    | ✕             
            |             |          |           | alignments3D    | alignments3D    | ✕             

This information is also available as a Python list from job.doc.input_slot_groups.

This example extraction job requires inputs micrographs and particles. These must be connected from one or more parent jobs that produce the same types (exposure and particle, respectively) as outputs, e.g., an Inspect Particle Picks job. Note also the required low-level slot connections:

Requires one or more connections from a job output with type exposure, i.e., CTF-corrected micrographs
- Must include low-level slots micrograph_blob and mscope_params
- May include optional low-level slots stat_blob and ctf
Requires zero or more connections from a job output with type particle, i.e., particle pick locations
- Must include low-level slot location
- May include optional low-level slots alignments2D and alignments3D

The job cannot run if the required low-level slots are not connected. If provided, optional low-level slots may be used by the job for additinal computation and results. See the main CryoSPARC Guide for details about how inputs and slots are used by specific job types.

Load the job or jobs which will provide the required inputs:

parent_job = project.find_job("J13")
parent_job.type, parent_job.status

('inspect_picks_v2', 'completed')

Inspect its outputs with job.print_output_spec():

parent_job.print_output_spec()

Output      | Title                | Type     | Result Slots                 | Result Types   
==============================================================================================
micrographs | Micrographs accepted | exposure | micrograph_blob              | micrograph_blob
            |                      |          | ctf                          | ctf            
            |                      |          | mscope_params                | mscope_params  
            |                      |          | background_blob              | stat_blob      
            |                      |          | micrograph_thumbnail_blob_1x | thumbnail_blob 
            |                      |          | micrograph_thumbnail_blob_2x | thumbnail_blob 
            |                      |          | ctf_stats                    | ctf_stats      
            |                      |          | micrograph_blob_non_dw       | micrograph_blob
            |                      |          | rigid_motion                 | motion         
            |                      |          | spline_motion                | motion         
            |                      |          | movie_blob                   | movie_blob     
            |                      |          | gain_ref_blob                | gain_ref_blob  
particles   | Particles accepted   | particle | location                     | location       
            |                      |          | pick_stats                   | pick_stats     
            |                      |          | ctf                          | ctf            

This information is also available as a Python list from job.doc.output_result_groups.

The types of the two outputs micrographs and particles match the types of the two required inputs and also have all the required slots. Connect them to the parent job with the job.connect() function:

job.connect(
    target_input="micrographs",
    source_job_uid=parent_job.uid,
    source_output="micrographs",
)
job.connect(
    target_input="particles",
    source_job_uid=parent_job.uid,
    source_output="particles",
)

True

Note

The input and output names do not always match, as in this case. e.g., if the parent output is named micrographs_accepted, specify source_output="micrographs_accepted".

Queuing and Running#

Once parameters are set and required inputs are connected, the job is ready to run. Use the job.queue() function to send the job to the CryoSPARC scheduler for execution on a given compute node or cluster.

job.queue(lane="cryoem5")

Omit the the lane argument to run directly on the current workstation or master. If required, wait until the job finishes with the job.wait_for_done() function:

job.wait_for_done(error_on_incomplete=True)

'completed'

The error_on_incomplete=True flag causes a Python exception if the job fails or is killed before completing successfully.

A running job may be killed with job.kill(). A queued, completed, killed or failed job may be cleared with job.clear(). After clearing, the job goes back to building status.

Inspecting Results#

While running, jobs produce various kinds of output files and associated metadata. These include:

Files such as motion-corrected micrographs, extracted particles, reconstructed volumes, etc.
.cs file datasets with computed metadata
Image assets and plots for display in the web interface

Use the job.list_files() function to get a list of files in the job’s directory:

job.list_files()

['J96_micrographs.csg',
 'J96_particles.csg',
 'J96_passthrough_micrographs.cs',
 'J96_passthrough_micrographs_incomplete.cs',
 'J96_passthrough_particles.cs',
 'events.bson',
 'extract',
 'extracted_particles.cs',
 'gridfs_data',
 'incomplete_micrographs.cs',
 'job.json',
 'job.log',
 'picked_micrographs.cs']

Specify a subfolder to show files in a specific sub directory such as extract:

extracted = job.list_files("extract")
extracted[0]

'extract/002297077740060436393_14sep05c_c_00003gr_00014sq_00009hl_00004es.frames_patch_aligned_doseweighted_particles.mrc'

Any file in a job directory may be downloaded for inspection with job.download_file():

job.download_file(extracted[0], target="sample.mrc")
with open("sample.mrc", "rb") as f:
    print(f"Downloaded {len(f.read())} bytes")

Downloaded 496141312 bytes

target may be a file path or writeble file handle. You may also use job.download_dataset() and job.load_output() to download cs files directly into Dataset objects (details in next section), or job.download_mrc() to download .mrc files as Numpy arrays:

header, data = job.download_mrc(extracted[0])
print(f"Downloaded {data.nbytes} byte particle stack with {header.nz} particles")

Downloaded 496140288 byte particle stack with 618 particles

Datasets#

All cryo-EM data processed in CryoSPARC have associated metadata and results that must be passed between jobs. CryoSPARC uses .cs Dataset files to do this.

A Dataset is a table where each row represents a unique cryo-EM data entity such as an exposure, particle, template, volume, etc. Each column is a data field associated with that entity such as path on disk, pixel size, dimensions, X/Y position, etc.

.cs files are binary-encodings of this tabular data.

Use job.download_dataset() to load a .cs file from the job directory. Use pandas to inspect the downloaded dataset in Jupyter or ipython:

import pandas as pd

particles = job.download_dataset("extracted_particles.cs")
pd.DataFrame(particles.rows())

	blob/idx	blob/import_sig	blob/path	blob/psize_A	blob/shape	blob/sign	location/center_x_frac	location/center_y_frac	location/exp_group_id	location/micrograph_path	location/micrograph_psize_A	location/micrograph_shape	location/micrograph_uid	uid
0	0	0	J96/extract/009270517818331954156_14sep05c_000...	0.6575	[448, 448]	-1.0	0.236207	0.701667	0	J2/motioncorrected/009270517818331954156_14sep...	0.0	[7676, 7420]	9270517818331954156	5182375780654809529
1	1	0	J96/extract/009270517818331954156_14sep05c_000...	0.6575	[448, 448]	-1.0	0.934483	0.753333	0	J2/motioncorrected/009270517818331954156_14sep...	0.0	[7676, 7420]	9270517818331954156	12660056651751289214
2	2	0	J96/extract/009270517818331954156_14sep05c_000...	0.6575	[448, 448]	-1.0	0.627586	0.163333	0	J2/motioncorrected/009270517818331954156_14sep...	0.0	[7676, 7420]	9270517818331954156	17971771557537199412
3	3	0	J96/extract/009270517818331954156_14sep05c_000...	0.6575	[448, 448]	-1.0	0.413793	0.448333	0	J2/motioncorrected/009270517818331954156_14sep...	0.0	[7676, 7420]	9270517818331954156	17954957875627625872
4	4	0	J96/extract/009270517818331954156_14sep05c_000...	0.6575	[448, 448]	-1.0	0.441379	0.311667	0	J2/motioncorrected/009270517818331954156_14sep...	0.0	[7676, 7420]	9270517818331954156	5996321661655483102
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
12225	624	0	J96/extract/011310893595949852984_14sep05c_c_0...	0.6575	[448, 448]	-1.0	0.951724	0.646667	0	J2/motioncorrected/011310893595949852984_14sep...	0.0	[7676, 7420]	11310893595949852984	13269417710913639089
12226	625	0	J96/extract/011310893595949852984_14sep05c_c_0...	0.6575	[448, 448]	-1.0	0.439655	0.131667	0	J2/motioncorrected/011310893595949852984_14sep...	0.0	[7676, 7420]	11310893595949852984	12819948907579588581
12227	626	0	J96/extract/011310893595949852984_14sep05c_c_0...	0.6575	[448, 448]	-1.0	0.627586	0.196667	0	J2/motioncorrected/011310893595949852984_14sep...	0.0	[7676, 7420]	11310893595949852984	4627702747760153532
12228	627	0	J96/extract/011310893595949852984_14sep05c_c_0...	0.6575	[448, 448]	-1.0	0.551724	0.155000	0	J2/motioncorrected/011310893595949852984_14sep...	0.0	[7676, 7420]	11310893595949852984	13411574058928503699
12229	628	0	J96/extract/011310893595949852984_14sep05c_c_0...	0.6575	[448, 448]	-1.0	0.968966	0.325000	0	J2/motioncorrected/011310893595949852984_14sep...	0.0	[7676, 7420]	11310893595949852984	510935515943909986

12230 rows × 14 columns

Each column is prefixed has format {slot}/{field}, e.g., ctf/amp_contrast or blob/path. An alternative definition for a low-level result in CryoSPARC is all the fields in a result dataset with the same prefix.

uid is a special numeric field which CryoSPARC uses to uniquely identify, join and de-duplicate metadata in input datasets.

Job output datasets are generally split up into two files:

The main result dataset, which includes new data created by this job
The passthrough dataset, which includes data inherited from the input dataset (.cs files with “passthrough” in the file name)

Use the job.load_output() function, which combines these two datasets and allows filtering for specific result slots. This provides a more convenient interface than job.download_dataset():

particles = job.load_output("particles", slots=["location", "ctf"])
pd.DataFrame(particles.rows())

	ctf/accel_kv	ctf/amp_contrast	ctf/anisomag	ctf/bfactor	ctf/cs_mm	ctf/df1_A	ctf/df2_A	ctf/df_angle_rad	ctf/exp_group_id	ctf/phase_shift_rad	...	location/exp_group_id	location/micrograph_path	location/micrograph_shape	location/micrograph_uid	location/min_dist_A	pick_stats/angle_rad	pick_stats/ncc_score	pick_stats/power	pick_stats/template_idx	uid
0	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	12453.643555	12341.103516	4.694420	0	0.0	...	0	J2/motioncorrected/009270517818331954156_14sep...	[7676, 7420]	9270517818331954156	100.0	0.000000	0.821854	686.504211	3	5182375780654809529
1	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	12321.397461	12208.857422	4.694420	0	0.0	...	0	J2/motioncorrected/009270517818331954156_14sep...	[7676, 7420]	9270517818331954156	100.0	0.000000	0.795290	801.999390	3	12660056651751289214
2	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	12363.504883	12250.964844	4.694420	0	0.0	...	0	J2/motioncorrected/009270517818331954156_14sep...	[7676, 7420]	9270517818331954156	100.0	0.959931	0.764936	729.822632	3	17971771557537199412
3	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	12401.663086	12289.123047	4.694420	0	0.0	...	0	J2/motioncorrected/009270517818331954156_14sep...	[7676, 7420]	9270517818331954156	100.0	2.617994	0.746300	778.620056	3	17954957875627625872
4	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	12421.460938	12308.920898	4.694420	0	0.0	...	0	J2/motioncorrected/009270517818331954156_14sep...	[7676, 7420]	9270517818331954156	100.0	0.436332	0.720111	862.048706	3	5996321661655483102
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
12225	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	15801.009766	15617.719727	-1.555879	0	0.0	...	0	J2/motioncorrected/011310893595949852984_14sep...	[7676, 7420]	11310893595949852984	100.0	1.570796	0.202069	727.920898	3	13269417710913639089
12226	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	15720.578125	15537.288086	-1.555879	0	0.0	...	0	J2/motioncorrected/011310893595949852984_14sep...	[7676, 7420]	11310893595949852984	100.0	4.886922	0.201657	779.085510	3	12819948907579588581
12227	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	15746.330078	15563.040039	-1.555879	0	0.0	...	0	J2/motioncorrected/011310893595949852984_14sep...	[7676, 7420]	11310893595949852984	100.0	2.443461	0.200585	772.648499	3	4627702747760153532
12228	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	15743.455078	15560.165039	-1.555879	0	0.0	...	0	J2/motioncorrected/011310893595949852984_14sep...	[7676, 7420]	11310893595949852984	100.0	1.570796	0.198699	871.016724	3	13411574058928503699
12229	300.0	0.1	[0.0, 0.0, 0.0, 0.0]	0.0	2.7	15814.901367	15631.611328	-1.555879	0	0.0	...	0	J2/motioncorrected/011310893595949852984_14sep...	[7676, 7420]	11310893595949852984	100.0	5.323254	0.198553	646.006897	3	510935515943909986

12230 rows × 29 columns

load_output includes all created and passthrough metadata if slots is not provided.

Dataset contents may be accessed as a dictionary of columns, where each column is a numpy array. Example data access:

expgroups = particles["ctf/exp_group_id"]  # read column
particles["ctf/exp_group_id"][:] = 42  # write column
particles["ctf/exp_group_id"][42] = 1  # write cell

If required, use an External job to save modified datasets back to CryoSPARC.

See the Dataset API documentation for all available dataset operations.

Assets#

Jobs may produce image files, plots and other miscellaneous output data that is accessible from the web interface when inspecting a job. These assets are not available on the file system; instead, CryoSPARC stores them in its MongoDB database for fast, frequent access.

Use the job.list_assets() function to view available assets for a job:

assets = job.list_assets()
assets[0]

{'_id': '6560d183562b2c67c7d35754',
 'chunkSize': 2096128,
 'contentType': 'image/png',
 'filename': 'J96_extracted_coordinates_on_j2motioncorrected009270517818331954156_14sep05c_00024sq_00003hl_00002esframes_patch_aligned_doseweightedmrc.png',
 'job_uid': 'J96',
 'length': 867617,
 'md5': '471ab293b92726043c8277cb6964f70b',
 'project_uid': 'P251',
 'uploadDate': '2023-11-24T16:38:27.800000'}

Similar to job.download_file(), download an asset to disk with the job.download_asset(), providing the asset ID and download location:

job.download_asset(assets[0]["_id"], "image.png")

PosixPath('image.png')

External Jobs#

cryosparc-tools may be integrated into custom cryo-EM workflows to load, modify and save CryoSPARC job results. It may also be used to integrate third-party cryo-EM tools such as Motioncor2, crYOLO or cryoDRGN with CryoSPARC.

External Jobs are special job types used to save these externally-processed results back to CryoSPARC. Read the examples or the API documention for more details.