Working with Jobs#

Use this guide as a reference to programmatically build, run and inspect CryoSPARC jobs with cryosparc-tools. The following capabilities are covered:

  • Creating jobs

  • Setting job parameters

  • Connecting job inputs and outputs

  • Queuing and running jobs

  • Inspecting job outputs, files and assets

Note

If you have never worked with CryoSPARC jobs and results before, first read the Creating and Running Jobs page in the CryoSPARC Guide.

To get started, first initialize the CryoSPARC client from the cryosparc.tools module.

from cryosparc.tools import CryoSPARC

cs = CryoSPARC("http://cryoem0.sbi:61000")
assert cs.test_connection()
Success: Connected to CryoSPARC API at http://cryoem0.sbi:61000

Use the CryoSPARC.find_project() function to load a project to work in:

project = cs.find_project("P75")

Browsing Jobs#

Use the project.find_jobs() function to get an iterable sequence of jobs within a project:

jobs = project.find_jobs()
job1 = next(jobs)
job2 = next(jobs)
job3 = next(jobs)
print(f"First job: {job1.uid}, {job1.type}")
print(f"Second job: {job2.uid}, {job2.type}")
print(f"Second job: {job3.uid}, {job3.type}")
First job: J259, import_movies
Second job: J260, patch_motion_correction_multi
Second job: J261, patch_ctf_estimation_multi

Specify filters like workspace_uid, type or category to only show a subset of jobs that match those filters.

for job in project.find_jobs(workspace_uid="W40", category="motion_correction"):
    print(f"Job: {job.uid}, {job.type}")
Job: J334, patch_motion_correction_multi
Job: J335, rigid_motion_correction_multi

Similar find functions are available for projects and workspaces.

Creating Jobs#

In CryoSPARC, cryo-EM data is processed by Jobs such as Import Movies and Ab-Initio Reconstruction. Jobs can import data from disk, process it and output Results that may be connected to other jobs.

Each job has an associated machine-readable type key that must be specified to create it. Show a table of jobs types available to create with the CryoSPARC.print_job_types() function. Optionally specify a section argument to only show job types from a specific section.

For example, to list available extraction and refinement job types:

cs.print_job_types(category=["extraction", "refinement"])
Category   | Job                              | Title                            | Stability
============================================================================================
extraction | extract_micrographs_multi        | Extract From Micrographs (GPU)   | stable   
           | extract_micrographs_cpu_parallel | Extract From Micrographs (CPU)   | stable   
           | downsample_particles             | Downsample Particles             | stable   
           | restack_particles                | Restack Particles                | stable   
refinement | homo_refine_new                  | Homogeneous Refinement           | stable   
           | hetero_refine                    | Heterogeneous Refinement         | stable   
           | nonuniform_refine_new            | Non-uniform Refinement           | stable   
           | homo_reconstruct                 | Homogeneous Reconstruction Only  | stable   
           | hetero_reconstruct_new           | Heterogenous Reconstruction Only | stable   

This information is also available as a Python list with cs.job_register.

Create a new job with the project.create_job() function. Specify a workspace UID and a job type (such as one of the types listed above):

job = project.create_job("W40", "extract_micrographs_cpu_parallel")
job.uid, job.status
('J1405', 'building')

Note the UID of the new job in the given workspace.

You may also use project.find_job() to load a job that was manually-created in the CryoSPARC interface:

job = project.find_job("J1405")
job.uid, job.type, job.status
('J1405', 'extract_micrographs_cpu_parallel', 'building')

Setting Parameters#

A newly-created job has status building. You may change parameters and connect outputs while a job is in this mode.

Use job.print_param_spec() to show a table of available parameters. The first column lists the machine-readable parameter name that may be used to assign this value:

job.print_param_spec()
Param                 | Title                                  | Type    | Default
==================================================================================
compute_num_cores     | Number of CPU cores                    | integer | 4      
box_size_pix          | Extraction box size (pix)              | integer | 256    
bin_size_pix          | Fourier-crop to box size (pix)         | integer | None   
bin_size_pix_small    | Second (small) F-crop box size (pix)   | integer | None   
output_f16            | Save results in 16-bit floating point  | boolean | 0      
force_reextract_CTF   | Force re-extract CTFs from micrographs | boolean | 0      
recenter_using_shifts | Recenter using aligned shifts          | boolean | 1      
num_extract           | Number of mics to extract              | integer | None   
flip_x                | Flip mic. in x before extract?         | boolean | 0      
flip_y                | Flip mic. in y before extract?         | boolean | 0      
scale_const_override  | Scale constant (override)              | number  | None   

This information is also available as a Python object from job.full_spec.params.

Based on the title, you may use the CryoSPARC web interface to browse detailed descriptions on these parameters.

Use job.set_param() to update a parameter. Returns True if the parameter was successfully updated:

job.set_param("box_size_pix", 448)
job.set_param("recenter_using_shifts", False)
True

Connecting Inputs and Outputs#

Most jobs also require inputs:

  • An Input is a connection to another parent job’s cryo-EM data Output

    • e.g., a list of micrographs, picked particles or a reconstructed volume

  • An Output is a group of low-level results produced when a job finishes running

  • Each Result includes various data and metadata about its parent output

    • e.g., motion correction information, computed CTF or particle blobs

  • Inputs have Slots that each correspond to an output result

In the CryoSPARC web interface, you may inspect the available inputs and outputs from a job’s “Inputs and Parameters” tab and “Outputs” tab, respectively.

With cryosparc-tools, use job.print_input_spec() to show a table of available input requirements for a given job.

job.print_input_spec()
Input       | Title       | Type     | Required? | Input Slots     | Slot Types      | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+)    | micrograph_blob | micrograph_blob | ✓             
            |             |          |           | background_blob | stat_blob       | ✕             
            |             |          |           | mscope_params   | mscope_params   | ✓             
            |             |          |           | ctf             | ctf             | ✕             
particles   | Particles   | particle | ✓ (1+)    | location        | location        | ✓             
            |             |          |           | alignments2D    | alignments2D    | ✕             
            |             |          |           | alignments3D    | alignments3D    | ✕             

This information is also available as a Python object from job.inputs.

This example extraction job requires inputs micrographs and particles. These must be connected from one or more parent jobs that produce the same types (exposure and particle, respectively) as outputs, e.g., an Inspect Particle Picks job. Note also the required low-level slot connections:

  • Requires one or more connections from a job output with type exposure, i.e., CTF-corrected micrographs

    • Must include low-level slots micrograph_blob and mscope_params

    • May include optional low-level slots stat_blob and ctf

  • Requires zero or more connections from a job output with type particle, i.e., particle pick locations

    • Must include low-level slot location

    • May include optional low-level slots alignments2D and alignments3D

The job cannot run if the required low-level slots are not connected. If provided, optional low-level slots may be used by the job for additinal computation and results. See the main CryoSPARC Guide for details about how inputs and slots are used by specific job types.

Load the job or jobs which will provide the required inputs:

parent_job = project.find_job("J345")
parent_job.type, parent_job.status
('inspect_picks_v2', 'completed')

Inspect its outputs with job.print_output_spec():

parent_job.print_output_spec()
Output      | Title                | Type     | Result Slots                 | Result Types    | Passthrough?
=============================================================================================================
micrographs | Micrographs accepted | exposure | micrograph_blob              | micrograph_blob | ✕           
            |                      |          | ctf                          | ctf             | ✓           
            |                      |          | ctf_stats                    | ctf_stats       | ✓           
            |                      |          | rigid_motion                 | motion          | ✓           
            |                      |          | spline_motion                | motion          | ✓           
            |                      |          | mscope_params                | mscope_params   | ✓           
            |                      |          | background_blob              | stat_blob       | ✓           
            |                      |          | micrograph_thumbnail_blob_1x | thumbnail_blob  | ✓           
            |                      |          | micrograph_thumbnail_blob_2x | thumbnail_blob  | ✓           
            |                      |          | micrograph_blob_non_dw       | micrograph_blob | ✓           
            |                      |          | micrograph_blob_non_dw_AB    | micrograph_blob | ✓           
            |                      |          | movie_blob                   | movie_blob      | ✓           
            |                      |          | gain_ref_blob                | gain_ref_blob   | ✓           
particles   | Particles accepted   | particle | location                     | location        | ✕           
            |                      |          | ctf                          | ctf             | ✓           
            |                      |          | pick_stats                   | pick_stats      | ✓           

This information is also available as a Python list from job.outputs.

The types of the two outputs micrographs and particles match the types of the two required inputs and also have all the required slots. Connect them to the parent job with the job.connect() function:

job.connect(
    target_input="micrographs",
    source_job_uid=parent_job.uid,
    source_output="micrographs",
)
job.connect(
    target_input="particles",
    source_job_uid=parent_job.uid,
    source_output="particles",
)
True

Note

The input and output names do not always match, as in this case. e.g., if the parent output is named micrographs_accepted, specify source_output="micrographs_accepted".

Queuing and Running#

Once parameters are set and required inputs are connected, the job is ready to run. Use the job.queue() function to send the job to the CryoSPARC scheduler for execution on a given compute node or cluster.

job.queue(lane="cryoem3")

Omit the the lane argument to run directly on the current workstation or master. If required, wait until the job finishes with the job.wait_for_done() function:

job.wait_for_done(error_on_incomplete=True)
'completed'

The error_on_incomplete=True flag causes a Python exception if the job fails or is killed before completing successfully.

A running job may be killed with job.kill(). A queued, completed, killed or failed job may be cleared with job.clear(). After clearing, the job goes back to building status.

Inspecting Results#

While running, jobs produce various kinds of output files and associated metadata. These include:

  • Files such as motion-corrected micrographs, extracted particles, reconstructed volumes, etc.

  • .cs file datasets with computed metadata

  • Image assets and plots for display in the web interface

Use the job.list_files() function to get a list of files in the job’s directory:

job.list_files()
['J1405_micrographs.csg',
 'J1405_particles.csg',
 'J1405_passthrough_micrographs.cs',
 'J1405_passthrough_micrographs_incomplete.cs',
 'J1405_passthrough_particles.cs',
 'events.bson',
 'extract',
 'extracted_particles.cs',
 'gridfs_data',
 'incomplete_micrographs.cs',
 'job.json',
 'job.log',
 'picked_micrographs.cs']

Specify a subfolder to show files in a specific sub directory such as extract:

extracted = job.list_files("extract")
extracted[0]
'extract/000340930003298771738_14sep05c_c_00003gr_00014sq_00010hl_00002es.frames_patch_aligned_doseweighted_particles.mrc'

Any file in a job directory may be downloaded for inspection with job.download_file():

job.download_file(extracted[0], target="sample.mrc")
with open("sample.mrc", "rb") as f:
    print(f"Downloaded {len(f.read())} bytes")
Downloaded 480887808 bytes

target may be a file path or writeble file handle. You may also use job.download_dataset() and job.load_output() to download cs files directly into Dataset objects (details in next section), or job.download_mrc() to download .mrc files as Numpy arrays:

header, data = job.download_mrc(extracted[0])
print(f"Downloaded {data.nbytes} byte particle stack with {header.nz} particles")
Downloaded 480886784 byte particle stack with 599 particles

Datasets#

All cryo-EM data processed in CryoSPARC have associated metadata and results that must be passed between jobs. CryoSPARC uses .cs Dataset files to do this.

A Dataset is a table where each row represents a unique cryo-EM data entity such as an exposure, particle, template, volume, etc. Each column is a data field associated with that entity such as path on disk, pixel size, dimensions, X/Y position, etc.

.cs files are binary-encodings of this tabular data.

Use job.download_dataset() to load a .cs file from the job directory. The dataset appears as a table when inspected in Jupyter:

particles = job.download_dataset("extracted_particles.cs")
particles
uid blob/path blob/idx blob/shape blob/psize_A blob/sign blob/import_sig location/micrograph_uid location/exp_group_id location/micrograph_path location/micrograph_shape location/micrograph_psize_A location/center_x_frac location/center_y_frac location/min_dist_A
0 4991923941886924448 J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc 0 [448 448] 0.6575000286102295 -1.0 0 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.41896551847457886 0.16500000655651093 100.0
1 7129680955360275073 J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc 1 [448 448] 0.6575000286102295 -1.0 0 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.7137930989265442 0.5883333086967468 100.0
2 10579529116942966593 J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc 2 [448 448] 0.6575000286102295 -1.0 0 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.7258620858192444 0.4300000071525574 100.0
3 6492766916285817490 J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc 3 [448 448] 0.6575000286102295 -1.0 0 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.12931033968925476 0.21833333373069763 100.0
4 1452544039939850654 J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc 4 [448 448] 0.6575000286102295 -1.0 0 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.20344828069210052 0.3083333373069763 100.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11608 13782326931696153388 J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc 562 [448 448] 0.6575000286102295 -1.0 0 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.30000001192092896 0.24666666984558105 100.0
11609 2038903213416418021 J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc 563 [448 448] 0.6575000286102295 -1.0 0 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.274137943983078 0.4583333432674408 100.0
11610 3302784391827733380 J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc 564 [448 448] 0.6575000286102295 -1.0 0 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.47068965435028076 0.4650000035762787 100.0
11611 3047055336306861916 J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc 565 [448 448] 0.6575000286102295 -1.0 0 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.48965516686439514 0.038333334028720856 100.0

Each column is prefixed has format {slot}/{field}, e.g., ctf/amp_contrast or blob/path. An alternative definition for a low-level result in CryoSPARC is all the fields in a result dataset with the same prefix.

uid is a special numeric field which CryoSPARC uses to uniquely identify, join and de-duplicate metadata in input datasets.

Job output datasets are generally split up into two files:

  1. The main result dataset, which includes new data created by this job

  2. The passthrough dataset, which includes data inherited from the input dataset (.cs files with “passthrough” in the file name)

Use the job.load_output() function, which combines these two datasets and allows filtering for specific result slots. This provides a more convenient interface than job.download_dataset():

particles = job.load_output("particles", slots=["location", "ctf"])
particles
uid location/micrograph_uid location/exp_group_id location/micrograph_path location/micrograph_shape location/micrograph_psize_A location/center_x_frac location/center_y_frac location/min_dist_A ctf/type ctf/exp_group_id ctf/accel_kv ctf/cs_mm ctf/amp_contrast ctf/df1_A ctf/df2_A ctf/df_angle_rad ctf/phase_shift_rad ctf/scale ctf/scale_const ctf/shift_A ctf/tilt_A ctf/trefoil_A ctf/tetra_A ctf/anisomag ctf/bfactor
0 4991923941886924448 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.41896551847457886 0.16500000655651093 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 12443.43359375 12330.8984375 4.693929672241211 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
1 7129680955360275073 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.7137930989265442 0.5883333086967468 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 12325.1123046875 12212.5771484375 4.693929672241211 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
2 10579529116942966593 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.7258620858192444 0.4300000071525574 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 12383.697265625 12271.162109375 4.693929672241211 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
3 6492766916285817490 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.12931033968925476 0.21833333373069763 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 12607.15234375 12494.6171875 4.693929672241211 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
4 1452544039939850654 14810031491354839884 25 J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.20344828069210052 0.3083333373069763 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 12506.732421875 12394.197265625 4.693929672241211 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11608 13782326931696153388 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.30000001192092896 0.24666666984558105 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 15721.2802734375 15538.005859375 -1.5557758808135986 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
11609 2038903213416418021 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.274137943983078 0.4583333432674408 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 15732.8623046875 15549.587890625 -1.5557758808135986 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
11610 3302784391827733380 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.47068965435028076 0.4650000035762787 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 15708.0078125 15524.7333984375 -1.5557758808135986 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0
11611 3047055336306861916 10952247516538852012 25 J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc [7676 7420] 0.6575000286102295 0.48965516686439514 0.038333334028720856 100.0 spline 25 300.0 2.700000047683716 0.10000000149011612 15763.7548828125 15580.48046875 -1.5557758808135986 0.0 1.0 1.0 [0. 0.] [0. 0.] [0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] 0.0

load_output includes all created and passthrough metadata if slots is not provided.

Dataset contents may be accessed as a dictionary of columns, where each column is a numpy array. Example data access:

expgroups = particles["ctf/exp_group_id"]  # read column
particles["ctf/exp_group_id"][:] = 42  # write column
particles["ctf/exp_group_id"][42] = 1  # write cell

If required, use an External job to save modified datasets back to CryoSPARC.

See the Dataset API documentation for all available dataset operations.

Assets#

Jobs may produce image files, plots and other miscellaneous output data that is accessible from the web interface when inspecting a job. These assets are not available on the file system; instead, CryoSPARC stores them in its MongoDB database for fast, frequent access.

Use the job.list_assets() function to view available assets for a job:

assets = job.list_assets()
assets[0].id, assets[0].filename
('69208ea074d09e2785654ce4',
 'J1405_extracted_coordinates_on_j331motioncorrected014810031491354839884_14sep05c_00024sq_00003hl_00002esframes_patch_aligned_doseweightedmrc.png')

Similar to job.download_file(), download an asset to disk with the job.download_asset(), providing the asset ID and download location:

job.download_asset(assets[0].id, "image.png")
'image.png'

External Jobs#

cryosparc-tools may be integrated into custom cryo-EM workflows to load, modify and save CryoSPARC job results. It may also be used to integrate third-party cryo-EM tools such as Motioncor2, crYOLO or cryoDRGN with CryoSPARC.

External Jobs are special job types used to save these externally-processed results back to CryoSPARC. Read the examples or the API documention for more details.