Working with Jobs#
Use this guide as a reference to programmatically build, run and inspect CryoSPARC jobs with cryosparc-tools. The following capabilities are covered:
Creating jobs
Setting job parameters
Connecting job inputs and outputs
Queuing and running jobs
Inspecting job outputs, files and assets
Note
If you have never worked with CryoSPARC jobs and results before, first read the Creating and Running Jobs page in the CryoSPARC Guide.
To get started, first initialize the CryoSPARC client from the cryosparc.tools
module.
from cryosparc.tools import CryoSPARC
cs = CryoSPARC(host="cryoem0.sbi", base_port=40000)
assert cs.test_connection()
Connection succeeded to CryoSPARC command_core at http://cryoem0.sbi:40002
Connection succeeded to CryoSPARC command_vis at http://cryoem0.sbi:40003
Connection succeeded to CryoSPARC command_rtp at http://cryoem0.sbi:40005
Creating Jobs#
In CryoSPARC, cryo-EM data is processed by Jobs such as Import Movies and Ab-Initio Reconstruction. Jobs can import data from disk, process it and output Results that may be connected to other jobs.
Each job has an associated machine-readable type
key that must be specified to
create it. Show a table of jobs types available to create with the CryoSPARC.print_job_types()
function. Optionally specify a section
argument to only show job types from a specific section.
For example, to list available extraction and refinement job types:
cs.print_job_types(section=["extraction", "refinement"])
Section | Job | Title
=================================================================================
extraction | extract_micrographs_multi | Extract From Micrographs (GPU)
| extract_micrographs_cpu_parallel | Extract From Micrographs (CPU)
| downsample_particles | Downsample Particles
| restack_particles | Restack Particles
refinement | homo_refine_new | Homogeneous Refinement
| hetero_refine | Heterogeneous Refinement
| nonuniform_refine_new | Non-uniform Refinement
| homo_reconstruct | Homogeneous Reconstruction Only
| hetero_reconstruct | Heterogeneous Reconstruction Only
This information is also available as a Python list with job.get_job_specs()
.
Use the CryoSPARC.find_project()
function to load a project to work in:
project = cs.find_project("P251")
Create a new job with the project.create_job()
function. Specify a workspace UID and a job type (such as one of the types listed above):
job = project.create_job("W1", "extract_micrographs_cpu_parallel")
job.uid, job.status
('J96', 'building')
Note the UID of the new job in the given workspace.
You may also use project.find_job()
to load a job that was manually-created in the CryoSPARC interface:
job = project.find_job("J96")
job.uid, job.type, job.status
('J96', 'extract_micrographs_cpu_parallel', 'building')
Setting Parameters#
A newly-created job has status building
. You may change parameters and connect outputs while a job is in this mode.
Use job.print_param_spec()
to show a table of available parameters. The first column lists the machine-readable parameter name that may be used to assign this value:
job.print_param_spec()
Param | Title | Type | Default
==================================================================================
bin_size_pix | Fourier-crop to box size (pix) | number | None
bin_size_pix_small | Second (small) F-crop box size (pix) | number | None
box_size_pix | Extraction box size (pix) | number | 256
compute_num_cores | Number of CPU cores | number | 4
flip_x | Flip mic. in x before extract? | boolean | False
flip_y | Flip mic. in y before extract? | boolean | False
force_reextract_CTF | Force re-extract CTFs from micrographs | boolean | False
num_extract | Number of mics to extract | number | None
output_f16 | Save results in 16-bit floating point | boolean | False
recenter_using_shifts | Recenter using aligned shifts | boolean | True
scale_const_override | Scale constant (override) | number | None
This information is also available as a Python dictionary from job.doc.params_base
.
Based on the title, you may use the CryoSPARC web interface to browse detailed descriptions on these parameters.
Use job.set_param()
to update a parameter. Returns True if the parameter was successfully updated:
job.set_param("box_size_pix", 448)
job.set_param("recenter_using_shifts", False)
True
Connecting Inputs and Outputs#
Most jobs also require inputs:
An Input is a connection to another parent job’s cryo-EM data Output
e.g., a list of micrographs, picked particles or a reconstructed volume
An Output is a group of low-level results produced when a job finishes running
Each Result includes various data and metadata about its parent output
e.g., motion correction information, computed CTF or particle blobs
Inputs have Slots that each correspond to an output result
In the CryoSPARC web interface, you may inspect the available inputs and outputs from a job’s “Inputs and Parameters” tab and “Outputs” tab, respectively.
With cryosparc-tools, use job.print_input_spec()
to show a table of available input requirements for a given job.
job.print_input_spec()
Input | Title | Type | Required? | Input Slots | Slot Types | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+) | micrograph_blob | micrograph_blob | ✓
| | | | background_blob | stat_blob | ✕
| | | | mscope_params | mscope_params | ✓
| | | | ctf | ctf | ✕
particles | Particles | particle | ✕ (0+) | location | location | ✓
| | | | alignments2D | alignments2D | ✕
| | | | alignments3D | alignments3D | ✕
This information is also available as a Python list from job.doc.input_slot_groups
.
This example extraction job requires inputs micrographs
and particles
. These must be connected from one or more parent jobs that produce the same types (exposure
and particle
, respectively) as outputs, e.g., an Inspect Particle Picks job. Note also the required low-level slot connections:
Requires one or more connections from a job output with type
exposure
, i.e., CTF-corrected micrographsMust include low-level slots
micrograph_blob
andmscope_params
May include optional low-level slots
stat_blob
andctf
Requires zero or more connections from a job output with type
particle
, i.e., particle pick locationsMust include low-level slot
location
May include optional low-level slots
alignments2D
andalignments3D
The job cannot run if the required low-level slots are not connected. If provided, optional low-level slots may be used by the job for additinal computation and results. See the main CryoSPARC Guide for details about how inputs and slots are used by specific job types.
Load the job or jobs which will provide the required inputs:
parent_job = project.find_job("J13")
parent_job.type, parent_job.status
('inspect_picks_v2', 'completed')
Inspect its outputs with job.print_output_spec()
:
parent_job.print_output_spec()
Output | Title | Type | Result Slots | Result Types
==============================================================================================
micrographs | Micrographs accepted | exposure | micrograph_blob | micrograph_blob
| | | ctf | ctf
| | | mscope_params | mscope_params
| | | background_blob | stat_blob
| | | micrograph_thumbnail_blob_1x | thumbnail_blob
| | | micrograph_thumbnail_blob_2x | thumbnail_blob
| | | ctf_stats | ctf_stats
| | | micrograph_blob_non_dw | micrograph_blob
| | | rigid_motion | motion
| | | spline_motion | motion
| | | movie_blob | movie_blob
| | | gain_ref_blob | gain_ref_blob
particles | Particles accepted | particle | location | location
| | | pick_stats | pick_stats
| | | ctf | ctf
This information is also available as a Python list from job.doc.output_result_groups
.
The types of the two outputs micrographs
and particles
match the types of the two required inputs and also have all the required slots. Connect them to the parent job with the job.connect()
function:
job.connect(
target_input="micrographs",
source_job_uid=parent_job.uid,
source_output="micrographs",
)
job.connect(
target_input="particles",
source_job_uid=parent_job.uid,
source_output="particles",
)
True
Note
The input and output names do not always match, as in this case. e.g., if the parent output is named micrographs_accepted
, specify source_output="micrographs_accepted"
.
Queuing and Running#
Once parameters are set and required inputs are connected, the job is ready to run. Use the job.queue()
function to send the job to the CryoSPARC scheduler for execution on a given compute node or cluster.
job.queue(lane="cryoem5")
Omit the the lane
argument to run directly on the current workstation or master. If required, wait until the job finishes with the job.wait_for_done()
function:
job.wait_for_done(error_on_incomplete=True)
'completed'
The error_on_incomplete=True
flag causes a Python exception if the job fails or is killed before completing successfully.
A running job may be killed with job.kill()
. A queued, completed, killed or failed job may be cleared with job.clear()
. After clearing, the job goes back to building
status.
Inspecting Results#
While running, jobs produce various kinds of output files and associated metadata. These include:
Files such as motion-corrected micrographs, extracted particles, reconstructed volumes, etc.
.cs
file datasets with computed metadataImage assets and plots for display in the web interface
Use the job.list_files()
function to get a list of files in the job’s directory:
job.list_files()
['J96_micrographs.csg',
'J96_particles.csg',
'J96_passthrough_micrographs.cs',
'J96_passthrough_micrographs_incomplete.cs',
'J96_passthrough_particles.cs',
'events.bson',
'extract',
'extracted_particles.cs',
'gridfs_data',
'incomplete_micrographs.cs',
'job.json',
'job.log',
'picked_micrographs.cs']
Specify a subfolder to show files in a specific sub directory such as extract
:
extracted = job.list_files("extract")
extracted[0]
'extract/002297077740060436393_14sep05c_c_00003gr_00014sq_00009hl_00004es.frames_patch_aligned_doseweighted_particles.mrc'
Any file in a job directory may be downloaded for inspection with job.download_file()
:
job.download_file(extracted[0], target="sample.mrc")
with open("sample.mrc", "rb") as f:
print(f"Downloaded {len(f.read())} bytes")
Downloaded 496141312 bytes
target
may be a file path or writeble file handle. You may also use job.download_dataset()
and job.load_output()
to download cs files directly into Dataset
objects (details in next section), or job.download_mrc()
to download .mrc files as Numpy arrays:
header, data = job.download_mrc(extracted[0])
print(f"Downloaded {data.nbytes} byte particle stack with {header.nz} particles")
Downloaded 496140288 byte particle stack with 618 particles
Datasets#
All cryo-EM data processed in CryoSPARC have associated metadata and results that must be passed between jobs. CryoSPARC uses .cs
Dataset files to do this.
A Dataset is a table where each row represents a unique cryo-EM data entity such as an exposure, particle, template, volume, etc. Each column is a data field associated with that entity such as path on disk, pixel size, dimensions, X/Y position, etc.
.cs
files are binary-encodings of this tabular data.
Use job.download_dataset()
to load a .cs
file from the job directory. Use pandas to inspect the downloaded dataset in Jupyter or ipython:
import pandas as pd
particles = job.download_dataset("extracted_particles.cs")
pd.DataFrame(particles.rows())
blob/idx | blob/import_sig | blob/path | blob/psize_A | blob/shape | blob/sign | location/center_x_frac | location/center_y_frac | location/exp_group_id | location/micrograph_path | location/micrograph_psize_A | location/micrograph_shape | location/micrograph_uid | uid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | J96/extract/009270517818331954156_14sep05c_000... | 0.6575 | [448, 448] | -1.0 | 0.236207 | 0.701667 | 0 | J2/motioncorrected/009270517818331954156_14sep... | 0.0 | [7676, 7420] | 9270517818331954156 | 5182375780654809529 |
1 | 1 | 0 | J96/extract/009270517818331954156_14sep05c_000... | 0.6575 | [448, 448] | -1.0 | 0.934483 | 0.753333 | 0 | J2/motioncorrected/009270517818331954156_14sep... | 0.0 | [7676, 7420] | 9270517818331954156 | 12660056651751289214 |
2 | 2 | 0 | J96/extract/009270517818331954156_14sep05c_000... | 0.6575 | [448, 448] | -1.0 | 0.627586 | 0.163333 | 0 | J2/motioncorrected/009270517818331954156_14sep... | 0.0 | [7676, 7420] | 9270517818331954156 | 17971771557537199412 |
3 | 3 | 0 | J96/extract/009270517818331954156_14sep05c_000... | 0.6575 | [448, 448] | -1.0 | 0.413793 | 0.448333 | 0 | J2/motioncorrected/009270517818331954156_14sep... | 0.0 | [7676, 7420] | 9270517818331954156 | 17954957875627625872 |
4 | 4 | 0 | J96/extract/009270517818331954156_14sep05c_000... | 0.6575 | [448, 448] | -1.0 | 0.441379 | 0.311667 | 0 | J2/motioncorrected/009270517818331954156_14sep... | 0.0 | [7676, 7420] | 9270517818331954156 | 5996321661655483102 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12225 | 624 | 0 | J96/extract/011310893595949852984_14sep05c_c_0... | 0.6575 | [448, 448] | -1.0 | 0.951724 | 0.646667 | 0 | J2/motioncorrected/011310893595949852984_14sep... | 0.0 | [7676, 7420] | 11310893595949852984 | 13269417710913639089 |
12226 | 625 | 0 | J96/extract/011310893595949852984_14sep05c_c_0... | 0.6575 | [448, 448] | -1.0 | 0.439655 | 0.131667 | 0 | J2/motioncorrected/011310893595949852984_14sep... | 0.0 | [7676, 7420] | 11310893595949852984 | 12819948907579588581 |
12227 | 626 | 0 | J96/extract/011310893595949852984_14sep05c_c_0... | 0.6575 | [448, 448] | -1.0 | 0.627586 | 0.196667 | 0 | J2/motioncorrected/011310893595949852984_14sep... | 0.0 | [7676, 7420] | 11310893595949852984 | 4627702747760153532 |
12228 | 627 | 0 | J96/extract/011310893595949852984_14sep05c_c_0... | 0.6575 | [448, 448] | -1.0 | 0.551724 | 0.155000 | 0 | J2/motioncorrected/011310893595949852984_14sep... | 0.0 | [7676, 7420] | 11310893595949852984 | 13411574058928503699 |
12229 | 628 | 0 | J96/extract/011310893595949852984_14sep05c_c_0... | 0.6575 | [448, 448] | -1.0 | 0.968966 | 0.325000 | 0 | J2/motioncorrected/011310893595949852984_14sep... | 0.0 | [7676, 7420] | 11310893595949852984 | 510935515943909986 |
12230 rows × 14 columns
Each column is prefixed has format {slot}/{field}
, e.g., ctf/amp_contrast
or blob/path
. An alternative definition for a low-level result in CryoSPARC is all the fields in a result dataset with the same prefix.
uid
is a special numeric field which CryoSPARC uses to uniquely identify, join and de-duplicate metadata in input datasets.
Job output datasets are generally split up into two files:
The main result dataset, which includes new data created by this job
The passthrough dataset, which includes data inherited from the input dataset (
.cs
files with “passthrough” in the file name)
Use the job.load_output()
function, which combines these two datasets and allows filtering for specific result slots. This provides a more convenient interface than job.download_dataset()
:
particles = job.load_output("particles", slots=["location", "ctf"])
pd.DataFrame(particles.rows())
ctf/accel_kv | ctf/amp_contrast | ctf/anisomag | ctf/bfactor | ctf/cs_mm | ctf/df1_A | ctf/df2_A | ctf/df_angle_rad | ctf/exp_group_id | ctf/phase_shift_rad | ... | location/exp_group_id | location/micrograph_path | location/micrograph_shape | location/micrograph_uid | location/min_dist_A | pick_stats/angle_rad | pick_stats/ncc_score | pick_stats/power | pick_stats/template_idx | uid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 12453.643555 | 12341.103516 | 4.694420 | 0 | 0.0 | ... | 0 | J2/motioncorrected/009270517818331954156_14sep... | [7676, 7420] | 9270517818331954156 | 100.0 | 0.000000 | 0.821854 | 686.504211 | 3 | 5182375780654809529 |
1 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 12321.397461 | 12208.857422 | 4.694420 | 0 | 0.0 | ... | 0 | J2/motioncorrected/009270517818331954156_14sep... | [7676, 7420] | 9270517818331954156 | 100.0 | 0.000000 | 0.795290 | 801.999390 | 3 | 12660056651751289214 |
2 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 12363.504883 | 12250.964844 | 4.694420 | 0 | 0.0 | ... | 0 | J2/motioncorrected/009270517818331954156_14sep... | [7676, 7420] | 9270517818331954156 | 100.0 | 0.959931 | 0.764936 | 729.822632 | 3 | 17971771557537199412 |
3 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 12401.663086 | 12289.123047 | 4.694420 | 0 | 0.0 | ... | 0 | J2/motioncorrected/009270517818331954156_14sep... | [7676, 7420] | 9270517818331954156 | 100.0 | 2.617994 | 0.746300 | 778.620056 | 3 | 17954957875627625872 |
4 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 12421.460938 | 12308.920898 | 4.694420 | 0 | 0.0 | ... | 0 | J2/motioncorrected/009270517818331954156_14sep... | [7676, 7420] | 9270517818331954156 | 100.0 | 0.436332 | 0.720111 | 862.048706 | 3 | 5996321661655483102 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12225 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 15801.009766 | 15617.719727 | -1.555879 | 0 | 0.0 | ... | 0 | J2/motioncorrected/011310893595949852984_14sep... | [7676, 7420] | 11310893595949852984 | 100.0 | 1.570796 | 0.202069 | 727.920898 | 3 | 13269417710913639089 |
12226 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 15720.578125 | 15537.288086 | -1.555879 | 0 | 0.0 | ... | 0 | J2/motioncorrected/011310893595949852984_14sep... | [7676, 7420] | 11310893595949852984 | 100.0 | 4.886922 | 0.201657 | 779.085510 | 3 | 12819948907579588581 |
12227 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 15746.330078 | 15563.040039 | -1.555879 | 0 | 0.0 | ... | 0 | J2/motioncorrected/011310893595949852984_14sep... | [7676, 7420] | 11310893595949852984 | 100.0 | 2.443461 | 0.200585 | 772.648499 | 3 | 4627702747760153532 |
12228 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 15743.455078 | 15560.165039 | -1.555879 | 0 | 0.0 | ... | 0 | J2/motioncorrected/011310893595949852984_14sep... | [7676, 7420] | 11310893595949852984 | 100.0 | 1.570796 | 0.198699 | 871.016724 | 3 | 13411574058928503699 |
12229 | 300.0 | 0.1 | [0.0, 0.0, 0.0, 0.0] | 0.0 | 2.7 | 15814.901367 | 15631.611328 | -1.555879 | 0 | 0.0 | ... | 0 | J2/motioncorrected/011310893595949852984_14sep... | [7676, 7420] | 11310893595949852984 | 100.0 | 5.323254 | 0.198553 | 646.006897 | 3 | 510935515943909986 |
12230 rows × 29 columns
load_output
includes all created and passthrough metadata if slots
is not provided.
Dataset contents may be accessed as a dictionary of columns, where each column is a numpy array. Example data access:
expgroups = particles["ctf/exp_group_id"] # read column
particles["ctf/exp_group_id"][:] = 42 # write column
particles["ctf/exp_group_id"][42] = 1 # write cell
If required, use an External job to save modified datasets back to CryoSPARC.
See the Dataset API documentation for all available dataset operations.
Assets#
Jobs may produce image files, plots and other miscellaneous output data that is accessible from the web interface when inspecting a job. These assets are not available on the file system; instead, CryoSPARC stores them in its MongoDB database for fast, frequent access.
Use the job.list_assets()
function to view available assets for a job:
assets = job.list_assets()
assets[0]
{'_id': '6560d183562b2c67c7d35754',
'chunkSize': 2096128,
'contentType': 'image/png',
'filename': 'J96_extracted_coordinates_on_j2motioncorrected009270517818331954156_14sep05c_00024sq_00003hl_00002esframes_patch_aligned_doseweightedmrc.png',
'job_uid': 'J96',
'length': 867617,
'md5': '471ab293b92726043c8277cb6964f70b',
'project_uid': 'P251',
'uploadDate': '2023-11-24T16:38:27.800000'}
Similar to job.download_file()
, download an asset to disk with the job.download_asset()
, providing the asset ID and download location:
job.download_asset(assets[0]["_id"], "image.png")
PosixPath('image.png')
External Jobs#
cryosparc-tools may be integrated into custom cryo-EM workflows to load, modify and save CryoSPARC job results. It may also be used to integrate third-party cryo-EM tools such as Motioncor2, crYOLO or cryoDRGN with CryoSPARC.
External Jobs are special job types used to save these externally-processed results back to CryoSPARC. Read the examples or the API documention for more details.