Working with Jobs#
Use this guide as a reference to programmatically build, run and inspect CryoSPARC jobs with cryosparc-tools. The following capabilities are covered:
Creating jobs
Setting job parameters
Connecting job inputs and outputs
Queuing and running jobs
Inspecting job outputs, files and assets
Note
If you have never worked with CryoSPARC jobs and results before, first read the Creating and Running Jobs page in the CryoSPARC Guide.
To get started, first initialize the CryoSPARC client from the cryosparc.tools module.
from cryosparc.tools import CryoSPARC
cs = CryoSPARC("http://cryoem0.sbi:61000")
assert cs.test_connection()
Success: Connected to CryoSPARC API at http://cryoem0.sbi:61000
Use the CryoSPARC.find_project() function to load a project to work in:
project = cs.find_project("P75")
Browsing Jobs#
Use the project.find_jobs() function to get an iterable sequence of jobs within a project:
jobs = project.find_jobs()
job1 = next(jobs)
job2 = next(jobs)
job3 = next(jobs)
print(f"First job: {job1.uid}, {job1.type}")
print(f"Second job: {job2.uid}, {job2.type}")
print(f"Second job: {job3.uid}, {job3.type}")
First job: J259, import_movies
Second job: J260, patch_motion_correction_multi
Second job: J261, patch_ctf_estimation_multi
Specify filters like workspace_uid, type or category to only show a subset of jobs that match those filters.
for job in project.find_jobs(workspace_uid="W40", category="motion_correction"):
print(f"Job: {job.uid}, {job.type}")
Job: J334, patch_motion_correction_multi
Job: J335, rigid_motion_correction_multi
Similar find functions are available for projects and workspaces.
Creating Jobs#
In CryoSPARC, cryo-EM data is processed by Jobs such as Import Movies and Ab-Initio Reconstruction. Jobs can import data from disk, process it and output Results that may be connected to other jobs.
Each job has an associated machine-readable type key that must be specified to
create it. Show a table of jobs types available to create with the CryoSPARC.print_job_types() function. Optionally specify a section argument to only show job types from a specific section.
For example, to list available extraction and refinement job types:
cs.print_job_types(category=["extraction", "refinement"])
Category | Job | Title | Stability
============================================================================================
extraction | extract_micrographs_multi | Extract From Micrographs (GPU) | stable
| extract_micrographs_cpu_parallel | Extract From Micrographs (CPU) | stable
| downsample_particles | Downsample Particles | stable
| restack_particles | Restack Particles | stable
refinement | homo_refine_new | Homogeneous Refinement | stable
| hetero_refine | Heterogeneous Refinement | stable
| nonuniform_refine_new | Non-uniform Refinement | stable
| homo_reconstruct | Homogeneous Reconstruction Only | stable
| hetero_reconstruct_new | Heterogenous Reconstruction Only | stable
This information is also available as a Python list with cs.job_register.
Create a new job with the project.create_job() function. Specify a workspace UID and a job type (such as one of the types listed above):
job = project.create_job("W40", "extract_micrographs_cpu_parallel")
job.uid, job.status
('J1405', 'building')
Note the UID of the new job in the given workspace.
You may also use project.find_job() to load a job that was manually-created in the CryoSPARC interface:
job = project.find_job("J1405")
job.uid, job.type, job.status
('J1405', 'extract_micrographs_cpu_parallel', 'building')
Setting Parameters#
A newly-created job has status building. You may change parameters and connect outputs while a job is in this mode.
Use job.print_param_spec() to show a table of available parameters. The first column lists the machine-readable parameter name that may be used to assign this value:
job.print_param_spec()
Param | Title | Type | Default
==================================================================================
compute_num_cores | Number of CPU cores | integer | 4
box_size_pix | Extraction box size (pix) | integer | 256
bin_size_pix | Fourier-crop to box size (pix) | integer | None
bin_size_pix_small | Second (small) F-crop box size (pix) | integer | None
output_f16 | Save results in 16-bit floating point | boolean | 0
force_reextract_CTF | Force re-extract CTFs from micrographs | boolean | 0
recenter_using_shifts | Recenter using aligned shifts | boolean | 1
num_extract | Number of mics to extract | integer | None
flip_x | Flip mic. in x before extract? | boolean | 0
flip_y | Flip mic. in y before extract? | boolean | 0
scale_const_override | Scale constant (override) | number | None
This information is also available as a Python object from job.full_spec.params.
Based on the title, you may use the CryoSPARC web interface to browse detailed descriptions on these parameters.
Use job.set_param() to update a parameter. Returns True if the parameter was successfully updated:
job.set_param("box_size_pix", 448)
job.set_param("recenter_using_shifts", False)
True
Connecting Inputs and Outputs#
Most jobs also require inputs:
An Input is a connection to another parent job’s cryo-EM data Output
e.g., a list of micrographs, picked particles or a reconstructed volume
An Output is a group of low-level results produced when a job finishes running
Each Result includes various data and metadata about its parent output
e.g., motion correction information, computed CTF or particle blobs
Inputs have Slots that each correspond to an output result
In the CryoSPARC web interface, you may inspect the available inputs and outputs from a job’s “Inputs and Parameters” tab and “Outputs” tab, respectively.
With cryosparc-tools, use job.print_input_spec() to show a table of available input requirements for a given job.
job.print_input_spec()
Input | Title | Type | Required? | Input Slots | Slot Types | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+) | micrograph_blob | micrograph_blob | ✓
| | | | background_blob | stat_blob | ✕
| | | | mscope_params | mscope_params | ✓
| | | | ctf | ctf | ✕
particles | Particles | particle | ✓ (1+) | location | location | ✓
| | | | alignments2D | alignments2D | ✕
| | | | alignments3D | alignments3D | ✕
This information is also available as a Python object from job.inputs.
This example extraction job requires inputs micrographs and particles. These must be connected from one or more parent jobs that produce the same types (exposure and particle, respectively) as outputs, e.g., an Inspect Particle Picks job. Note also the required low-level slot connections:
Requires one or more connections from a job output with type
exposure, i.e., CTF-corrected micrographsMust include low-level slots
micrograph_blobandmscope_paramsMay include optional low-level slots
stat_blobandctf
Requires zero or more connections from a job output with type
particle, i.e., particle pick locationsMust include low-level slot
locationMay include optional low-level slots
alignments2Dandalignments3D
The job cannot run if the required low-level slots are not connected. If provided, optional low-level slots may be used by the job for additinal computation and results. See the main CryoSPARC Guide for details about how inputs and slots are used by specific job types.
Load the job or jobs which will provide the required inputs:
parent_job = project.find_job("J345")
parent_job.type, parent_job.status
('inspect_picks_v2', 'completed')
Inspect its outputs with job.print_output_spec():
parent_job.print_output_spec()
Output | Title | Type | Result Slots | Result Types | Passthrough?
=============================================================================================================
micrographs | Micrographs accepted | exposure | micrograph_blob | micrograph_blob | ✕
| | | ctf | ctf | ✓
| | | ctf_stats | ctf_stats | ✓
| | | rigid_motion | motion | ✓
| | | spline_motion | motion | ✓
| | | mscope_params | mscope_params | ✓
| | | background_blob | stat_blob | ✓
| | | micrograph_thumbnail_blob_1x | thumbnail_blob | ✓
| | | micrograph_thumbnail_blob_2x | thumbnail_blob | ✓
| | | micrograph_blob_non_dw | micrograph_blob | ✓
| | | micrograph_blob_non_dw_AB | micrograph_blob | ✓
| | | movie_blob | movie_blob | ✓
| | | gain_ref_blob | gain_ref_blob | ✓
particles | Particles accepted | particle | location | location | ✕
| | | ctf | ctf | ✓
| | | pick_stats | pick_stats | ✓
This information is also available as a Python list from job.outputs.
The types of the two outputs micrographs and particles match the types of the two required inputs and also have all the required slots. Connect them to the parent job with the job.connect() function:
job.connect(
target_input="micrographs",
source_job_uid=parent_job.uid,
source_output="micrographs",
)
job.connect(
target_input="particles",
source_job_uid=parent_job.uid,
source_output="particles",
)
True
Note
The input and output names do not always match, as in this case. e.g., if the parent output is named micrographs_accepted, specify source_output="micrographs_accepted".
Queuing and Running#
Once parameters are set and required inputs are connected, the job is ready to run. Use the job.queue() function to send the job to the CryoSPARC scheduler for execution on a given compute node or cluster.
job.queue(lane="cryoem3")
Omit the the lane argument to run directly on the current workstation or master. If required, wait until the job finishes with the job.wait_for_done() function:
job.wait_for_done(error_on_incomplete=True)
'completed'
The error_on_incomplete=True flag causes a Python exception if the job fails or is killed before completing successfully.
A running job may be killed with job.kill(). A queued, completed, killed or failed job may be cleared with job.clear(). After clearing, the job goes back to building status.
Inspecting Results#
While running, jobs produce various kinds of output files and associated metadata. These include:
Files such as motion-corrected micrographs, extracted particles, reconstructed volumes, etc.
.csfile datasets with computed metadataImage assets and plots for display in the web interface
Use the job.list_files() function to get a list of files in the job’s directory:
job.list_files()
['J1405_micrographs.csg',
'J1405_particles.csg',
'J1405_passthrough_micrographs.cs',
'J1405_passthrough_micrographs_incomplete.cs',
'J1405_passthrough_particles.cs',
'events.bson',
'extract',
'extracted_particles.cs',
'gridfs_data',
'incomplete_micrographs.cs',
'job.json',
'job.log',
'picked_micrographs.cs']
Specify a subfolder to show files in a specific sub directory such as extract:
extracted = job.list_files("extract")
extracted[0]
'extract/000340930003298771738_14sep05c_c_00003gr_00014sq_00010hl_00002es.frames_patch_aligned_doseweighted_particles.mrc'
Any file in a job directory may be downloaded for inspection with job.download_file():
job.download_file(extracted[0], target="sample.mrc")
with open("sample.mrc", "rb") as f:
print(f"Downloaded {len(f.read())} bytes")
Downloaded 480887808 bytes
target may be a file path or writeble file handle. You may also use job.download_dataset() and job.load_output() to download cs files directly into Dataset objects (details in next section), or job.download_mrc() to download .mrc files as Numpy arrays:
header, data = job.download_mrc(extracted[0])
print(f"Downloaded {data.nbytes} byte particle stack with {header.nz} particles")
Downloaded 480886784 byte particle stack with 599 particles
Datasets#
All cryo-EM data processed in CryoSPARC have associated metadata and results that must be passed between jobs. CryoSPARC uses .cs Dataset files to do this.
A Dataset is a table where each row represents a unique cryo-EM data entity such as an exposure, particle, template, volume, etc. Each column is a data field associated with that entity such as path on disk, pixel size, dimensions, X/Y position, etc.
.cs files are binary-encodings of this tabular data.
Use job.download_dataset() to load a .cs file from the job directory. The dataset appears as a table when inspected in Jupyter:
particles = job.download_dataset("extracted_particles.cs")
particles
| uid | blob/path | blob/idx | blob/shape | blob/psize_A | blob/sign | blob/import_sig | location/micrograph_uid | location/exp_group_id | location/micrograph_path | location/micrograph_shape | location/micrograph_psize_A | location/center_x_frac | location/center_y_frac | location/min_dist_A | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4991923941886924448 | J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc | 0 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 14810031491354839884 | 25 | J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.41896551847457886 | 0.16500000655651093 | 100.0 |
| 1 | 7129680955360275073 | J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc | 1 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 14810031491354839884 | 25 | J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.7137930989265442 | 0.5883333086967468 | 100.0 |
| 2 | 10579529116942966593 | J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc | 2 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 14810031491354839884 | 25 | J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.7258620858192444 | 0.4300000071525574 | 100.0 |
| 3 | 6492766916285817490 | J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc | 3 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 14810031491354839884 | 25 | J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.12931033968925476 | 0.21833333373069763 | 100.0 |
| 4 | 1452544039939850654 | J1405/extract/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted_particles.mrc | 4 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 14810031491354839884 | 25 | J331/motioncorrected/014810031491354839884_14sep05c_00024sq_00003hl_00002es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.20344828069210052 | 0.3083333373069763 | 100.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 11608 | 13782326931696153388 | J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc | 562 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 10952247516538852012 | 25 | J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.30000001192092896 | 0.24666666984558105 | 100.0 |
| 11609 | 2038903213416418021 | J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc | 563 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 10952247516538852012 | 25 | J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.274137943983078 | 0.4583333432674408 | 100.0 |
| 11610 | 3302784391827733380 | J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc | 564 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 10952247516538852012 | 25 | J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.47068965435028076 | 0.4650000035762787 | 100.0 |
| 11611 | 3047055336306861916 | J1405/extract/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted_particles.mrc | 565 | [448 448] | 0.6575000286102295 | -1.0 | 0 | 10952247516538852012 | 25 | J331/motioncorrected/010952247516538852012_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames_patch_aligned_doseweighted.mrc | [7676 7420] | 0.6575000286102295 | 0.48965516686439514 | 0.038333334028720856 | 100.0 |