Working with Jobs#

Use this guide as a reference to programmatically build, run and inspect CryoSPARC jobs with cryosparc-tools. The following capabilities are covered:

  • Creating jobs

  • Setting job parameters

  • Connecting job inputs and outputs

  • Queuing and running jobs

  • Inspecting job outputs, files and assets

Note

If you have never worked with CryoSPARC jobs and results before, first read the Creating and Running Jobs page in the CryoSPARC Guide.

To get started, first initialize the CryoSPARC client from the cryosparc.tools module.

from cryosparc.tools import CryoSPARC

cs = CryoSPARC(host="cryoem0.sbi", base_port=40000)
assert cs.test_connection()
Connection succeeded to CryoSPARC command_core at http://cryoem0.sbi:40002
Connection succeeded to CryoSPARC command_vis at http://cryoem0.sbi:40003
Connection succeeded to CryoSPARC command_rtp at http://cryoem0.sbi:40005

Creating Jobs#

In CryoSPARC, cryo-EM data is processed by Jobs such as Import Movies and Ab-Initio Reconstruction. Jobs can import data from disk, process it and output Results that may be connected to other jobs.

Each job has an associated machine-readable type key that must be specified to create it. Show a table of jobs types available to create with the CryoSPARC.print_job_types() function. Optionally specify a section argument to only show job types from a specific section.

For example, to list available extraction and refinement job types:

cs.print_job_types(section=["extraction", "refinement"])
Section    | Job                              | Title                            
=================================================================================
extraction | extract_micrographs_multi        | Extract From Micrographs (GPU)   
           | extract_micrographs_cpu_parallel | Extract From Micrographs (CPU)   
           | downsample_particles             | Downsample Particles             
           | restack_particles                | Restack Particles                
refinement | homo_refine_new                  | Homogeneous Refinement           
           | hetero_refine                    | Heterogeneous Refinement         
           | nonuniform_refine_new            | Non-uniform Refinement           
           | homo_reconstruct                 | Homogeneous Reconstruction Only  
           | hetero_reconstruct               | Heterogeneous Reconstruction Only

This information is also available as a Python list with job.get_job_specs().

Use the CryoSPARC.find_project() function to load a project to work in:

project = cs.find_project("P251")

Create a new job with the project.create_job() function. Specify a workspace UID and a job type (such as one of the types listed above):

job = project.create_job("W1", "extract_micrographs_cpu_parallel")
job.uid, job.status
('J96', 'building')

Note the UID of the new job in the given workspace.

You may also use project.find_job() to load a job that was manually-created in the CryoSPARC interface:

job = project.find_job("J96")
job.uid, job.type, job.status
('J96', 'extract_micrographs_cpu_parallel', 'building')

Setting Parameters#

A newly-created job has status building. You may change parameters and connect outputs while a job is in this mode.

Use job.print_param_spec() to show a table of available parameters. The first column lists the machine-readable parameter name that may be used to assign this value:

job.print_param_spec()
Param                 | Title                                  | Type    | Default
==================================================================================
bin_size_pix          | Fourier-crop to box size (pix)         | number  | None   
bin_size_pix_small    | Second (small) F-crop box size (pix)   | number  | None   
box_size_pix          | Extraction box size (pix)              | number  | 256    
compute_num_cores     | Number of CPU cores                    | number  | 4      
flip_x                | Flip mic. in x before extract?         | boolean | False  
flip_y                | Flip mic. in y before extract?         | boolean | False  
force_reextract_CTF   | Force re-extract CTFs from micrographs | boolean | False  
num_extract           | Number of mics to extract              | number  | None   
output_f16            | Save results in 16-bit floating point  | boolean | False  
recenter_using_shifts | Recenter using aligned shifts          | boolean | True   
scale_const_override  | Scale constant (override)              | number  | None   

This information is also available as a Python dictionary from job.doc.params_base.

Based on the title, you may use the CryoSPARC web interface to browse detailed descriptions on these parameters.

Use job.set_param() to update a parameter. Returns True if the parameter was successfully updated:

job.set_param("box_size_pix", 448)
job.set_param("recenter_using_shifts", False)
True

Connecting Inputs and Outputs#

Most jobs also require inputs:

  • An Input is a connection to another parent job’s cryo-EM data Output

    • e.g., a list of micrographs, picked particles or a reconstructed volume

  • An Output is a group of low-level results produced when a job finishes running

  • Each Result includes various data and metadata about its parent output

    • e.g., motion correction information, computed CTF or particle blobs

  • Inputs have Slots that each correspond to an output result

In the CryoSPARC web interface, you may inspect the available inputs and outputs from a job’s “Inputs and Parameters” tab and “Outputs” tab, respectively.

With cryosparc-tools, use job.print_input_spec() to show a table of available input requirements for a given job.

job.print_input_spec()
Input       | Title       | Type     | Required? | Input Slots     | Slot Types      | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+)    | micrograph_blob | micrograph_blob | ✓             
            |             |          |           | background_blob | stat_blob       | ✕             
            |             |          |           | mscope_params   | mscope_params   | ✓             
            |             |          |           | ctf             | ctf             | ✕             
particles   | Particles   | particle | ✕ (0+)    | location        | location        | ✓             
            |             |          |           | alignments2D    | alignments2D    | ✕             
            |             |          |           | alignments3D    | alignments3D    | ✕             

This information is also available as a Python list from job.doc.input_slot_groups.

This example extraction job requires inputs micrographs and particles. These must be connected from one or more parent jobs that produce the same types (exposure and particle, respectively) as outputs, e.g., an Inspect Particle Picks job. Note also the required low-level slot connections:

  • Requires one or more connections from a job output with type exposure, i.e., CTF-corrected micrographs

    • Must include low-level slots micrograph_blob and mscope_params

    • May include optional low-level slots stat_blob and ctf

  • Requires zero or more connections from a job output with type particle, i.e., particle pick locations

    • Must include low-level slot location

    • May include optional low-level slots alignments2D and alignments3D

The job cannot run if the required low-level slots are not connected. If provided, optional low-level slots may be used by the job for additinal computation and results. See the main CryoSPARC Guide for details about how inputs and slots are used by specific job types.

Load the job or jobs which will provide the required inputs:

parent_job = project.find_job("J13")
parent_job.type, parent_job.status
('inspect_picks_v2', 'completed')

Inspect its outputs with job.print_output_spec():

parent_job.print_output_spec()
Output      | Title                | Type     | Result Slots                 | Result Types   
==============================================================================================
micrographs | Micrographs accepted | exposure | micrograph_blob              | micrograph_blob
            |                      |          | ctf                          | ctf            
            |                      |          | mscope_params                | mscope_params  
            |                      |          | background_blob              | stat_blob      
            |                      |          | micrograph_thumbnail_blob_1x | thumbnail_blob 
            |                      |          | micrograph_thumbnail_blob_2x | thumbnail_blob 
            |                      |          | ctf_stats                    | ctf_stats      
            |                      |          | micrograph_blob_non_dw       | micrograph_blob
            |                      |          | rigid_motion                 | motion         
            |                      |          | spline_motion                | motion         
            |                      |          | movie_blob                   | movie_blob     
            |                      |          | gain_ref_blob                | gain_ref_blob  
particles   | Particles accepted   | particle | location                     | location       
            |                      |          | pick_stats                   | pick_stats     
            |                      |          | ctf                          | ctf            

This information is also available as a Python list from job.doc.output_result_groups.

The types of the two outputs micrographs and particles match the types of the two required inputs and also have all the required slots. Connect them to the parent job with the job.connect() function:

job.connect(
    target_input="micrographs",
    source_job_uid=parent_job.uid,
    source_output="micrographs",
)
job.connect(
    target_input="particles",
    source_job_uid=parent_job.uid,
    source_output="particles",
)
True

Note

The input and output names do not always match, as in this case. e.g., if the parent output is named micrographs_accepted, specify source_output="micrographs_accepted".

Queuing and Running#

Once parameters are set and required inputs are connected, the job is ready to run. Use the job.queue() function to send the job to the CryoSPARC scheduler for execution on a given compute node or cluster.

job.queue(lane="cryoem5")

Omit the the lane argument to run directly on the current workstation or master. If required, wait until the job finishes with the job.wait_for_done() function:

job.wait_for_done(error_on_incomplete=True)
'completed'

The error_on_incomplete=True flag causes a Python exception if the job fails or is killed before completing successfully.

A running job may be killed with job.kill(). A queued, completed, killed or failed job may be cleared with job.clear(). After clearing, the job goes back to building status.

Inspecting Results#

While running, jobs produce various kinds of output files and associated metadata. These include:

  • Files such as motion-corrected micrographs, extracted particles, reconstructed volumes, etc.

  • .cs file datasets with computed metadata

  • Image assets and plots for display in the web interface

Use the job.list_files() function to get a list of files in the job’s directory:

job.list_files()
['J96_micrographs.csg',
 'J96_particles.csg',
 'J96_passthrough_micrographs.cs',
 'J96_passthrough_micrographs_incomplete.cs',
 'J96_passthrough_particles.cs',
 'events.bson',
 'extract',
 'extracted_particles.cs',
 'gridfs_data',
 'incomplete_micrographs.cs',
 'job.json',
 'job.log',
 'picked_micrographs.cs']

Specify a subfolder to show files in a specific sub directory such as extract:

extracted = job.list_files("extract")
extracted[0]
'extract/002297077740060436393_14sep05c_c_00003gr_00014sq_00009hl_00004es.frames_patch_aligned_doseweighted_particles.mrc'

Any file in a job directory may be downloaded for inspection with job.download_file():

job.download_file(extracted[0], target="sample.mrc")
with open("sample.mrc", "rb") as f:
    print(f"Downloaded {len(f.read())} bytes")
Downloaded 496141312 bytes

target may be a file path or writeble file handle. You may also use job.download_dataset() and job.load_output() to download cs files directly into Dataset objects (details in next section), or job.download_mrc() to download .mrc files as Numpy arrays:

header, data = job.download_mrc(extracted[0])
print(f"Downloaded {data.nbytes} byte particle stack with {header.nz} particles")
Downloaded 496140288 byte particle stack with 618 particles

Datasets#

All cryo-EM data processed in CryoSPARC have associated metadata and results that must be passed between jobs. CryoSPARC uses .cs Dataset files to do this.

A Dataset is a table where each row represents a unique cryo-EM data entity such as an exposure, particle, template, volume, etc. Each column is a data field associated with that entity such as path on disk, pixel size, dimensions, X/Y position, etc.

.cs files are binary-encodings of this tabular data.

Use job.download_dataset() to load a .cs file from the job directory. Use pandas to inspect the downloaded dataset in Jupyter or ipython:

import pandas as pd

particles = job.download_dataset("extracted_particles.cs")
pd.DataFrame(particles.rows())
blob/idx blob/import_sig blob/path blob/psize_A blob/shape blob/sign location/center_x_frac location/center_y_frac location/exp_group_id location/micrograph_path location/micrograph_psize_A location/micrograph_shape location/micrograph_uid uid
0 0 0 J96/extract/009270517818331954156_14sep05c_000... 0.6575 [448, 448] -1.0 0.236207 0.701667 0 J2/motioncorrected/009270517818331954156_14sep... 0.0 [7676, 7420] 9270517818331954156 5182375780654809529
1 1 0 J96/extract/009270517818331954156_14sep05c_000... 0.6575 [448, 448] -1.0 0.934483 0.753333 0 J2/motioncorrected/009270517818331954156_14sep... 0.0 [7676, 7420] 9270517818331954156 12660056651751289214
2 2 0 J96/extract/009270517818331954156_14sep05c_000... 0.6575 [448, 448] -1.0 0.627586 0.163333 0 J2/motioncorrected/009270517818331954156_14sep... 0.0 [7676, 7420] 9270517818331954156 17971771557537199412
3 3 0 J96/extract/009270517818331954156_14sep05c_000... 0.6575 [448, 448] -1.0 0.413793 0.448333 0 J2/motioncorrected/009270517818331954156_14sep... 0.0 [7676, 7420] 9270517818331954156 17954957875627625872
4 4 0 J96/extract/009270517818331954156_14sep05c_000... 0.6575 [448, 448] -1.0 0.441379 0.311667 0 J2/motioncorrected/009270517818331954156_14sep... 0.0 [7676, 7420] 9270517818331954156 5996321661655483102
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12225 624 0 J96/extract/011310893595949852984_14sep05c_c_0... 0.6575 [448, 448] -1.0 0.951724 0.646667 0 J2/motioncorrected/011310893595949852984_14sep... 0.0 [7676, 7420] 11310893595949852984 13269417710913639089
12226 625 0 J96/extract/011310893595949852984_14sep05c_c_0... 0.6575 [448, 448] -1.0 0.439655 0.131667 0 J2/motioncorrected/011310893595949852984_14sep... 0.0 [7676, 7420] 11310893595949852984 12819948907579588581
12227 626 0 J96/extract/011310893595949852984_14sep05c_c_0... 0.6575 [448, 448] -1.0 0.627586 0.196667 0 J2/motioncorrected/011310893595949852984_14sep... 0.0 [7676, 7420] 11310893595949852984 4627702747760153532
12228 627 0 J96/extract/011310893595949852984_14sep05c_c_0... 0.6575 [448, 448] -1.0 0.551724 0.155000 0 J2/motioncorrected/011310893595949852984_14sep... 0.0 [7676, 7420] 11310893595949852984 13411574058928503699
12229 628 0 J96/extract/011310893595949852984_14sep05c_c_0... 0.6575 [448, 448] -1.0 0.968966 0.325000 0 J2/motioncorrected/011310893595949852984_14sep... 0.0 [7676, 7420] 11310893595949852984 510935515943909986

12230 rows × 14 columns

Each column is prefixed has format {slot}/{field}, e.g., ctf/amp_contrast or blob/path. An alternative definition for a low-level result in CryoSPARC is all the fields in a result dataset with the same prefix.

uid is a special numeric field which CryoSPARC uses to uniquely identify, join and de-duplicate metadata in input datasets.

Job output datasets are generally split up into two files:

  1. The main result dataset, which includes new data created by this job

  2. The passthrough dataset, which includes data inherited from the input dataset (.cs files with “passthrough” in the file name)

Use the job.load_output() function, which combines these two datasets and allows filtering for specific result slots. This provides a more convenient interface than job.download_dataset():

particles = job.load_output("particles", slots=["location", "ctf"])
pd.DataFrame(particles.rows())
ctf/accel_kv ctf/amp_contrast ctf/anisomag ctf/bfactor ctf/cs_mm ctf/df1_A ctf/df2_A ctf/df_angle_rad ctf/exp_group_id ctf/phase_shift_rad ... location/exp_group_id location/micrograph_path location/micrograph_shape location/micrograph_uid location/min_dist_A pick_stats/angle_rad pick_stats/ncc_score pick_stats/power pick_stats/template_idx uid
0 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 12453.643555 12341.103516 4.694420 0 0.0 ... 0 J2/motioncorrected/009270517818331954156_14sep... [7676, 7420] 9270517818331954156 100.0 0.000000 0.821854 686.504211 3 5182375780654809529
1 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 12321.397461 12208.857422 4.694420 0 0.0 ... 0 J2/motioncorrected/009270517818331954156_14sep... [7676, 7420] 9270517818331954156 100.0 0.000000 0.795290 801.999390 3 12660056651751289214
2 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 12363.504883 12250.964844 4.694420 0 0.0 ... 0 J2/motioncorrected/009270517818331954156_14sep... [7676, 7420] 9270517818331954156 100.0 0.959931 0.764936 729.822632 3 17971771557537199412
3 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 12401.663086 12289.123047 4.694420 0 0.0 ... 0 J2/motioncorrected/009270517818331954156_14sep... [7676, 7420] 9270517818331954156 100.0 2.617994 0.746300 778.620056 3 17954957875627625872
4 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 12421.460938 12308.920898 4.694420 0 0.0 ... 0 J2/motioncorrected/009270517818331954156_14sep... [7676, 7420] 9270517818331954156 100.0 0.436332 0.720111 862.048706 3 5996321661655483102
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12225 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 15801.009766 15617.719727 -1.555879 0 0.0 ... 0 J2/motioncorrected/011310893595949852984_14sep... [7676, 7420] 11310893595949852984 100.0 1.570796 0.202069 727.920898 3 13269417710913639089
12226 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 15720.578125 15537.288086 -1.555879 0 0.0 ... 0 J2/motioncorrected/011310893595949852984_14sep... [7676, 7420] 11310893595949852984 100.0 4.886922 0.201657 779.085510 3 12819948907579588581
12227 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 15746.330078 15563.040039 -1.555879 0 0.0 ... 0 J2/motioncorrected/011310893595949852984_14sep... [7676, 7420] 11310893595949852984 100.0 2.443461 0.200585 772.648499 3 4627702747760153532
12228 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 15743.455078 15560.165039 -1.555879 0 0.0 ... 0 J2/motioncorrected/011310893595949852984_14sep... [7676, 7420] 11310893595949852984 100.0 1.570796 0.198699 871.016724 3 13411574058928503699
12229 300.0 0.1 [0.0, 0.0, 0.0, 0.0] 0.0 2.7 15814.901367 15631.611328 -1.555879 0 0.0 ... 0 J2/motioncorrected/011310893595949852984_14sep... [7676, 7420] 11310893595949852984 100.0 5.323254 0.198553 646.006897 3 510935515943909986

12230 rows × 29 columns

load_output includes all created and passthrough metadata if slots is not provided.

Dataset contents may be accessed as a dictionary of columns, where each column is a numpy array. Example data access:

expgroups = particles["ctf/exp_group_id"]  # read column
particles["ctf/exp_group_id"][:] = 42  # write column
particles["ctf/exp_group_id"][42] = 1  # write cell

If required, use an External job to save modified datasets back to CryoSPARC.

See the Dataset API documentation for all available dataset operations.

Assets#

Jobs may produce image files, plots and other miscellaneous output data that is accessible from the web interface when inspecting a job. These assets are not available on the file system; instead, CryoSPARC stores them in its MongoDB database for fast, frequent access.

Use the job.list_assets() function to view available assets for a job:

assets = job.list_assets()
assets[0]
{'_id': '6560d183562b2c67c7d35754',
 'chunkSize': 2096128,
 'contentType': 'image/png',
 'filename': 'J96_extracted_coordinates_on_j2motioncorrected009270517818331954156_14sep05c_00024sq_00003hl_00002esframes_patch_aligned_doseweightedmrc.png',
 'job_uid': 'J96',
 'length': 867617,
 'md5': '471ab293b92726043c8277cb6964f70b',
 'project_uid': 'P251',
 'uploadDate': '2023-11-24T16:38:27.800000'}

Similar to job.download_file(), download an asset to disk with the job.download_asset(), providing the asset ID and download location:

job.download_asset(assets[0]["_id"], "image.png")
PosixPath('image.png')

External Jobs#

cryosparc-tools may be integrated into custom cryo-EM workflows to load, modify and save CryoSPARC job results. It may also be used to integrate third-party cryo-EM tools such as Motioncor2, crYOLO or cryoDRGN with CryoSPARC.

External Jobs are special job types used to save these externally-processed results back to CryoSPARC. Read the examples or the API documention for more details.