cryosparc.job

cryosparc.job#

Defines the Job and External job classes for accessing CryoSPARC jobs.

Classes:

`ExternalJob`(cs, project_uid, uid)	Mutable custom output job with customizeble input slots and output results.
`Job`(cs, project_uid, uid)	Accessor class to a job in CryoSPARC with ability to load inputs and outputs, add to job log, download job files.

Data:

GROUP_NAME_PATTERN

Input and output result groups may only contain, letters, numbers and underscores.

class cryosparc.job.ExternalJob(cs: CryoSPARC, project_uid: str, uid: str)#

Mutable custom output job with customizeble input slots and output results. Use External jobs to save data save cryo-EM data generated by a software package outside of CryoSPARC.

Created external jobs may be connected to any other CryoSPARC job result as an input. Its outputs must be created manually and may be configured to passthrough inherited input fields, just as with regular CryoSPARC jobs.

Create a new External Job with Project.create_external_job. ExternalJob is a subclass of Job and inherits all its methods.

uid#

Job unique ID, e.g., “J42”

Type:: str

project_uid#

Project unique ID, e.g., “P3”

Type:: str

doc#

All job data from the CryoSPARC database. Database contents may change over time, use the refresh method to update.

Type:: JobDocument

Examples

Import multiple exposure groups into a single job

>>> from cryosparc.tools import CryoSPARC
>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.create_external_job("W3", title="Import Image Sets")
>>> for i in range(3):
...     dset = job.add_output(
...         type="exposure",
...         name=f"images_{i}",
...         slots=["movie_blob", "mscope_params", "gain_ref_blob"],
...         alloc=10  # allocate a dataset for this output with 10 rows
...     )
...     dset['movie_blob/path'] = ...  # populate dataset
...     job.save_output(output_name, dset)

Methods:

`add_input`(type[, name, min, max, slots, title])	Add an input slot to the current job.
`add_output`()	Add an output slot to the current job.
`alloc_output`(name[, alloc])	Allocate an empty dataset for the given output with the given name.
`connect`(target_input, source_job_uid, ...[, ...])	Connect the given input for this job to an output with given job UID and name.
`kill`()	Kill this job.
`queue`([lane, hostname, gpus, cluster_vars])	Queue a job to a target lane.
`run`()	Start a job within a context manager and stop the job when the context ends.
`save_output`(name, dataset, *[, refresh])	Save output dataset to external job.
`start`([status])	Set job status to "running" or "waiting"
`stop`([error])	Set job status to "completed" or "failed"

add_input(type: Literal['exposure', 'particle', 'template', 'volume', 'volume_multi', 'mask', 'live', 'ml_model', 'symmetry_candidate', 'flex_mesh', 'flex_model', 'hyperparameter', 'denoise_model', 'annotation_model'], name: str | None = None, min: int = 0, max: int | Literal['inf'] = 'inf', slots: Iterable[str | Datafield] = [], title: str | None = None)#

Add an input slot to the current job. May be connected to zero or more outputs from other jobs (depending on the min and max values).

Parameters:

type (Datatype) – cryo-EM data type for this output, e.g., “particle”
name (str, optional) – Output name key, e.g., “picked_particles”. Defaults to None.
min (int, optional) – Minimum number of required input connections. Defaults to 0.
max (int | Literal["inf"], optional) – Maximum number of input connections. Specify "inf" for unlimited connections. Defaults to “inf”.
slots (list[SlotSpec], optional) – List of slots that should be connected to this input, such as "location" or "blob" Defaults to [].
title (str, optional) – Human-readable title for this input. Defaults to None.

Raises:

CommandError – General CryoSPARC network access error such as timeout, URL or HTTP
InvalidSlotsError – slots argument is invalid

Returns:

name of created input

Return type:

str

Examples

Create an external job that accepts micrographs as input:

>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.create_external_job("W1", title="Custom Picker")
>>> job.uid
"J3"
>>> job.add_input(
...     type="exposure",
...     name="input_micrographs",
...     min=1,
...     slots=["micrograph_blob", "ctf"],
...     title="Input micrographs for picking
... )
"input_micrographs"

add_output(type: Literal['exposure', 'particle', 'template', 'volume', 'volume_multi', 'mask', 'live', 'ml_model', 'symmetry_candidate', 'flex_mesh', 'flex_model', 'hyperparameter', 'denoise_model', 'annotation_model'], name: str | None = None, slots: List[str | Datafield] = [], passthrough: str | None = None, title: str | None = None) → str#

add_output(type: Literal['exposure', 'particle', 'template', 'volume', 'volume_multi', 'mask', 'live', 'ml_model', 'symmetry_candidate', 'flex_mesh', 'flex_model', 'hyperparameter', 'denoise_model', 'annotation_model'], name: str | None = None, slots: List[str | Datafield] = [], passthrough: str | None = None, title: str | None = None, *, alloc: int | Dataset) → Dataset

Add an output slot to the current job. Optionally returns the corresponding empty dataset if alloc is specified.

Parameters:

type (Datatype) – cryo-EM datatype for this output, e.g., “particle”
name (str, optional) – Output name key, e.g., “selected_particles”. Same as type if not specified. Defaults to None.
slots (list[SlotSpec], optional) – List of slot expected to be created for this output, such as location or blob. Do not specify any slots that were passed through from an input unless those slots are modified in the output. Defaults to [].
passthrough (str, optional) – Indicates that this output inherits slots from an existing input with the specified name. The input must first be added with add_input(). Defaults to False.
title (str, optional) – Human-readable title for this input. Defaults to None.
alloc (int | Dataset, optional) – If specified, pre-allocate and return a dataset with the requested slots. Specify an integer to allocate a specific number of rows. Specify a Dataset from which to inherit unique row IDs (useful when adding passthrough outputs). Defaults to None.

Raises:

CommandError – General CryoSPARC network access error such as timeout, URL or HTTP
InvalidSlotsError – slots argument is invalid

Returns:

Name of the created output. If alloc is: specified as an integer, instead returns blank dataset with the given size and random UIDs. If alloc is specified as a Dataset, returns blank dataset with the same UIDs.

Return type:

str | Dataset

Examples

Create and allocate an output for new particle picks

>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> particles_dset = job.add_output(
...     type="particle",
...     name="picked_particles",
...     slots=["location", "pick_stats"],
...     alloc=10000
... )

Create an inheritied output for input micrographs

>>> job.add_output(
...     type="exposures",
...     name="picked_micrographs",
...     passthrough="input_micrographs",
...     title="Passthrough picked micrographs"
... )
"picked_micrographs"

Create an output with multiple slots of the same type

>>> job.add_output(
...     type="particle",
...     name="particle_alignments",
...     slots=[
...         {"dtype": "alignments3D", "prefix": "alignments_class_0", "required": True},
...         {"dtype": "alignments3D", "prefix": "alignments_class_1", "required": True},
...         {"dtype": "alignments3D", "prefix": "alignments_class_2", "required": True},
...     ]
... )
"particle_alignments"

alloc_output(name: str, alloc: int | ArrayLike | Dataset = 0) → Dataset#

Allocate an empty dataset for the given output with the given name. Initialize with the given number of empty rows. The result may be used with save_output with the same output name.

Parameters:

name (str) – Name of job output to allocate
size (int | ArrayLike | Dataset, optional) – Specify as one of the following: (A) integer to allocate a specific number of rows, (B) a numpy array of numbers to use for UIDs in the allocated dataset or (C) a dataset from which to inherit unique row IDs (useful for allocating passthrough outputs). Defaults to 0.

Returns:

Empty dataset with the given number of rows

Return type:

Dataset

Examples

Allocate a dataset of size 10,000 for an output for new particle picks

>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> job.alloc_output("picked_particles", 10000)
Dataset([  # 10000 items, 11 fields
    ("uid": [...]),
    ("location/micrograph_path", ["", ...]),
    ...
])

Allocate a dataset from an existing input passthrough dataset

>>> input_micrographs = job.load_input("input_micrographs")
>>> job.alloc_output("picked_micrographs", input_micrographs)
Dataset([  # same "uid" field as input_micrographs
    ("uid": [...]),
])

connect(target_input: str, source_job_uid: str, source_output: str, *, slots: List[str | Datafield] = [], title: str = '', desc: str = '', refresh: bool = True) → bool#

Connect the given input for this job to an output with given job UID and name. If this input does not exist, it will be added with the given slots. At least one slot must be specified if the input does not exist.

Parameters:

target_input (str) – Input name to connect into. Will be created if does not already exist.
source_job_uid (str) – Job UID to connect from, e.g., “J42”
source_output (str) – Job output name to connect from , e.g., “particles”
slots (list[SlotSpec], optional) – List of slots to add to created input. All if not specified. Defaults to [].
title (str, optional) – Human readable title for created input. Defaults to “”.
desc (str, optional) – Human readable description for created input. Defaults to “”.
refresh (bool, optional) – Auto-refresh job document after connecting. Defaults to True.

Raises:

CommandError – General CryoSPARC network access error such as timeout, URL or HTTP
InvalidSlotsError – slots argument is invalid

Examples

Connect J3 to CTF-corrected micrographs from J2’s micrographs output.

>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> job.connect("input_micrographs", "J2", "micrographs")

kill()#: Kill this job.

queue(lane: str | None = None, hostname: str | None = None, gpus: List[int] = [], cluster_vars: Dict[str, Any] = {})#

Queue a job to a target lane. Available lanes may be queried with CryoSPARC.get_lanes.

Optionally specify a hostname for a node or cluster in the given lane. Optionally specify specific GPUs indexes to use for computation.

Available hostnames for a given lane may be queried with CryoSPARC.get_targets.

Parameters:

lane (str, optional) – Configuried compute lane to queue to. Leave unspecified to run directly on the master or current workstation. Defaults to None.
hostname (str, optional) – Specific hostname in compute lane, if more than one is available. Defaults to None.
gpus (list[int], optional) – GPUs to queue to. If specified, must have as many GPUs as required in job parameters. Leave unspecified to use first available GPU(s). Defaults to [].
cluster_vars (dict[str, Any], optional) – Specify custom cluster variables when queuing to a cluster. Keys are variable names. Defaults to False.

Examples

Queue a job to lane named “worker”:

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.status
"building"
>>> job.queue("worker")
>>> job.status
"queued"

run()#

Start a job within a context manager and stop the job when the context ends.

Yields:: ExternalJob – self.

Examples

Job will be marked as “failed” if the contents of the block throw an exception

>>> with job.run():
...     job.save_output(...)

save_output(name: str, dataset: Dataset, *, refresh: bool = True)#

Save output dataset to external job.

Parameters:

name (str) – Name of output on this job.
dataset (Dataset) – Value of output with only required fields.
refresh (bool, Optional) – Auto-refresh job document after saving. Defaults to True

Examples

Save a previously-allocated output.

>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> particles = job.alloc_output("picked_particles", 10000)
>>> job.save_output("picked_particles", particles)

start(status: Literal['running', 'waiting'] = 'waiting')#

Set job status to “running” or “waiting”

Parameters:: status (str, optional) – “running” or “waiting”. Defaults to “waiting”.

stop(error=False)#

Set job status to “completed” or “failed”

Parameters:: error (bool, optional) – Job completed with errors. Defaults to False.

cryosparc.job.GROUP_NAME_PATTERN = '^[A-Za-z][0-9A-Za-z_]*$'#: Input and output result groups may only contain, letters, numbers and underscores.

class cryosparc.job.Job(cs: CryoSPARC, project_uid: str, uid: str)#

Accessor class to a job in CryoSPARC with ability to load inputs and outputs, add to job log, download job files. Should be instantiated through CryoSPARC.find_job or Project.find_job.

uid#

Job unique ID, e.g., “J42”

Type:: str

project_uid#

Project unique ID, e.g., “P3”

Type:: str

doc#

All job data from the CryoSPARC database. Database contents may change over time, use the refresh method to update.

Type:: JobDocument

Examples

Find an existing job.

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.status
"building"

Queue a job.

>>> job.queue("worker_lane")
>>> job.status
"queued"

Create a 3-class ab-initio job connected to existing particles.

>>> job = cs.create_job("P3", "W1", "homo_abinit"
...     connections={"particles": ("J20", "particles_selected")}
...     params={"abinit_K": 3}
... )
>>> job.queue()
>>> job.status
"queued"

Methods:

`clear`()	Clear this job and reset to building status.
`connect`(target_input, source_job_uid, ...[, ...])	Connect the given input for this job to an output with given job UID and name.
`connect_result`(target_input, connection_idx, ...)	Connect a low-level input result slot with a result from another job.
`cp`(source_path[, target_path])	Copy a file or folder into the job directory.
`dir`()	Get the path to the job directory.
`disconnect`(target_input[, connection_idx, ...])	Clear the given job input group.
`disconnect_result`(target_input, ...)	Clear the job's given input result slot.
`download`(path)	Initiate a download request for a file inside the job's directory.
`download_asset`(fileid, target)	Download a job asset from the database with the given ID.
`download_dataset`(path)	Download a .cs dataset file from the given path in the job directory.
`download_file`(path[, target])	Download file from job directory to the given target path or writeable file handle.
`download_mrc`(path)	Download a .mrc file from the given relative path in the job directory.
`interact`(action[, body, timeout, refresh])	Call an interactive action on a waiting interactive job.
`kill`()	Kill this job.
`list_assets`()	Get a list of files available in the database for this job.
`list_files`([prefix, recursive])	Get a list of files inside the job directory.
`load_input`(name[, slots])	Load the dataset connected to the job's input with the given name.
`load_output`(name[, slots, version])	Load the dataset for the job's output with the given name.
`log`(text[, level])	Append to a job's event log.
`log_checkpoint`([meta])	Append a checkpoint to the job's event log.
`log_plot`(figure, text[, formats, raw_data, ...])	Add a log line with the given figure.
`mkdir`(target_path[, parents, exist_ok])	Create a folder in the given job.
`print_input_spec`()	Print a table of input keys, their title, type, connection requirements and details about their low-level required slots.
`print_output_spec`()	Print a table of output keys, their title, type and details about their low-level results.
`print_param_spec`()	Print a table of parameter keys, their title, type and default to standard output:
`queue`([lane, hostname, gpus, cluster_vars])	Queue a job to a target lane.
`refresh`()	Reload this job from the CryoSPARC database.
`set_param`(name, value, *[, refresh])	Set the given param name on the current job to the given value.
`subprocess`(args[, mute, checkpoint, ...])	Launch a subprocess and write its text-based output and error to the job log.
`symlink`(source_path[, target_path])	Create a symbolic link in job's directory.
`upload`(target_path, source, *[, overwrite])	Upload the given file to the job directory at the given path.
`upload_asset`(file[, filename, format])	Upload an image or text file to the current job.
`upload_dataset`(target_path, dset, *[, ...])	Upload a dataset as a CS file into the job directory.
`upload_mrc`(target_path, data, psize, *[, ...])	Upload a numpy 2D or 3D array to the job directory as an MRC file.
`upload_plot`(figure[, name, formats, ...])	Upload the given figure.
`wait_for_done`(*[, error_on_incomplete, timeout])	Wait until a job reaches status "completed", "killed" or "failed".
`wait_for_status`(status, *[, timeout])	Wait for a job's status to reach the specified value.

Attributes:

`status`	scheduling status.
`type`	Job type key

clear()#: Clear this job and reset to building status.

connect(target_input: str, source_job_uid: str, source_output: str, *, refresh: bool = True) → bool#

Connect the given input for this job to an output with given job UID and name.

Parameters:

target_input (str) – Input name to connect into. Will be created if not specified.
source_job_uid (str) – Job UID to connect from, e.g., “J42”
source_output (str) – Job output name to connect from , e.g., “particles”
refresh (bool, optional) – Auto-refresh job document after connecting. Defaults to True.

Returns:

False if the job encountered a build error.

Return type:

bool

Examples

Connect J3 to CTF-corrected micrographs from J2’s micrographs output.

>>> cs = CryoSPARC()
>>> project = cs.find_project("P3")
>>> job = project.find_job("J3")
>>> job.connect("input_micrographs", "J2", "micrographs")

connect_result(target_input: str, connection_idx: int, slot: str, source_job_uid: str, source_output: str, source_result: str)#

Connect a low-level input result slot with a result from another job.

Parameters:

target_input (str) – Input name to connect into, e.g., “particles”
connection_idx (int) – Connection index to connect into, use 0 for the job’s first connection on that input, 1 for the second, etc.
slot (str) – Input slot name to connect into, e.g., “location”
source_job_uid (str) – Job UID to connect from, e.g., “J42”
source_output (str) – Job output name to connect from , e.g., “particles_selected”
source_result (str) – Result name to connect from, e.g., “location”

Returns:

False if the job encountered a build error.

Return type:

bool

Examples

Connect J3 to the first connection of J2’s particles input. >>> cs = CryoSPARC() >>> project = cs.find_project(“P3”) >>> job = project.find_job(“J3”) >>> job.connect_result(“particles”, 0, “location”, “J2”, “particles_selected”, “location”)

cp(source_path: str | PurePosixPath, target_path: str | PurePosixPath = '')#

Copy a file or folder into the job directory.

Parameters:

source_path (str | Path) – Relative or absolute path of source file or folder to copy. If relative, assumed to be within the job directory.
target_path (str | Path, optional) – Name or path in the job directory to copy into. If not specified, uses the same file name as the source. Defaults to “”.

dir() → PurePosixPath#

Get the path to the job directory.

Returns:: job directory Pure Path instance
Return type:: Path

disconnect(target_input: str, connection_idx: int | None = None, *, refresh: bool = True)#

Clear the given job input group.

Parameters:

target_input (str) – Name of input to disconnect
connection_idx (int, optional) – Connection index to clear. Set to 0 to clear the first connection, 1 for the second, etc. If unspecified, clears all connections. Defaults to None.
refresh (bool, optional) – Auto-refresh job document after connecting. Defaults to True.

disconnect_result(target_input: str, connection_idx: int, slot: str)#

Clear the job’s given input result slot.

Parameters:

target_input (str) – Name of input to disconnect
connection_idx (int) – Connection index to modify. Set to 0 for the first connection, 1 for the second, etc.
slot (str) – Input slot name to disconnect, e.g., “location”

Returns:

False if the job encountered a build error.

Return type:

bool

download(path: str | PurePosixPath)#

Initiate a download request for a file inside the job’s directory. Use to get files from a remote CryoSPARC instance where the job directory is not available on the client file system.

Parameters:

path (str | Path) – Name or path of file in job directory.

Yields:

HTTPResponse –

Use a context manager to read the file from the: request body.

Examples

Download a job’s metadata

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> with job.download("job.json") as res:
>>>     job_data = json.loads(res.read())

download_asset(fileid: str, target: str | PurePath | IO[bytes])#

Download a job asset from the database with the given ID. Note that the file does not necessary have to belong to the current job.

Parameters:

fileid (str) – GridFS file object ID
target (str | Path | IO) – Local file path, directory path or writeable file handle to write response data.

Returns:

resulting target path or file handle.

Return type:

Path | IO

download_dataset(path: str | PurePosixPath)#

Download a .cs dataset file from the given path in the job directory.

Parameters:: path (str | Path) – Name or path of .cs file in job directory.
Returns:: Loaded dataset instance
Return type:: Dataset

download_file(path: str | PurePosixPath, target: str | PurePath | IO[bytes] = '')#

Download file from job directory to the given target path or writeable file handle.

Parameters:

path (str | Path) – Name or path of file in job directory.
target (str | Path | IO) – Local file path, directory path or writeable file handle to write response data. If not specified, downloads to current working directory with same file name. Defaults to “”.

Returns:

resulting target path or file handle.

Return type:

Path | IO

download_mrc(path: str | PurePosixPath)#

Download a .mrc file from the given relative path in the job directory.

Parameters:: path (str | Path) – Name or path of .mrc file in job directory.
Returns:: MRC file header and data as a numpy array
Return type:: tuple[Header, NDArray]

interact(action: str, body: Any = {}, *, timeout: int = 10, refresh: bool = False) → Any#

Call an interactive action on a waiting interactive job. The possible actions and expected body depends on the job type.

Parameters:

action (str) – Interactive endpoint to call.
body (any) – Body parameters for the interactive endpoint. Must be JSON-encodable.
timeout (int, optional) – Maximum time to wait for the action to complete, in seconds. Defaults to 10.
refresh (bool, optional) – If True, refresh the job document after posting. Defaults to False.

kill()#: Kill this job.

list_assets() → List[AssetDetails]#

Get a list of files available in the database for this job. Returns a list with details about the assets. Each entry is a dict with a _id key which may be used to download the file with the download_asset method.

Returns:: Asset details
Return type:: list[AssetDetails]

list_files(prefix: str | PurePosixPath = '', recursive: bool = False) → List[str]#

Get a list of files inside the job directory.

Parameters:

prefix (str | Path, optional) – Subdirectory inside job to list. Defaults to “”.
recursive (bool, optional) – If True, lists files recursively. Defaults to False.

Returns:

List of file paths relative to the job directory.

Return type:

list[str]

load_input(name: str, slots: Iterable[str] = [])#

Load the dataset connected to the job’s input with the given name.

Parameters:

name (str) – Input to load
fields (list[str], optional) – List of specific slots to load, such as movie_blob or locations, or all slots if not specified. Defaults to [].

Raises:

TypeError – If the job doesn’t have the given input or the dataset cannot be loaded.

Returns:

Loaded dataset

Return type:

Dataset

load_output(name: str, slots: Iterable[str] = [], version: int | Literal['F'] = 'F')#

Load the dataset for the job’s output with the given name.

Parameters:

name (str) – Output to load
slots (list[str], optional) – List of specific slots to load, such as movie_blob or locations, or all slots if not specified (including passthrough). Defaults to [].
version (int | Literal["F"], optional) – Specific output version to load. Use this to load the output at different stages of processing. Leave unspecified to load final verion. Defaults to “F”

Raises:

TypeError – If job does not have any results for the given output

Returns:

Loaded dataset

Return type:

Dataset

log(text: str, level: Literal['text', 'warning', 'error'] = 'text')#

Append to a job’s event log.

Parameters:

text (str) – Text to log
level (str, optional) – Log level (“text”, “warning” or “error”). Defaults to “text”.

Returns:

Created log event ID

Return type:

str

log_checkpoint(meta: dict = {})#

Append a checkpoint to the job’s event log.

Parameters:: meta (dict, optional) – Additional meta information. Defaults to {}.
Returns:: Created checkpoint event ID
Return type:: str

log_plot(figure: str | PurePath | IO[bytes] | Any, text: str, formats: Iterable[Literal['pdf', 'gif', 'jpg', 'jpeg', 'png', 'svg']] = ['png', 'pdf'], raw_data: str | bytes | None = None, raw_data_file: str | PurePath | IO[bytes] | None = None, raw_data_format: Literal['txt', 'csv', 'html', 'json', 'xml', 'bild', 'bld'] | None = None, flags: List[str] = ['plots'], savefig_kw: dict = {'bbox_inches': 'tight', 'pad_inches': 0})#

Add a log line with the given figure.

figure must be one of the following

Path to an existing image file in PNG, JPEG, GIF, SVG or PDF format
A file handle-like object with the binary data of an image
A matplotlib plot

If a matplotlib figure is specified, Uploads the plots in png and pdf formats. Override the formats argument with formats=['<format1>', '<format2>', ...] to save in different image formats.

If a text-version of the given plot is available (e.g., in csv format), specify raw_data with the full contents or raw_data_file with a path or binary file handle pointing to the contents. Assumes file format from extension or raw_data_format. Defaults to "txt" if cannot be determined.

Parameters:

figure (str | Path | IO | Figure) – Image file path, file handle or matplotlib figure instance
text (str) – Associated description for given figure
formats (list[ImageFormat], optional) – Image formats to save plot into. If a figure is a file handle, specify formats=['<format>'], where <format> is a valid image extension such as png or pdf. Assumes png if not specified. Defaults to [“png”, “pdf”].
raw_data (str | bytes, optional) – Raw text data for associated plot, generally in CSV, XML or JSON format. Cannot be specified with raw_data_file. Defaults to None.
raw_data_file (str | Path | IO, optional) – Path to raw text data. Cannot be specified with raw_data. Defaults to None.
raw_data_format (TextFormat, optional) – Format for raw text data. Defaults to None.
flags (list[str], optional) – Flags to use for UI rendering. Generally should not be specified. Defaults to [“plots”].
savefig_kw (dict, optional) – If a matplotlib figure is specified optionally specify keyword arguments for the savefig method. Defaults to dict(bbox_inches=”tight”, pad_inches=0).

Returns:

Created log event ID

Return type:

str

mkdir(target_path: str | PurePosixPath, parents: bool = False, exist_ok: bool = False)#

Create a folder in the given job.

Parameters:

target_path (str | Path) – Name or path of folder to create inside the job directory.
parents (bool, optional) – If True, any missing parents are created as needed. Defaults to False.
exist_ok (bool, optional) – If True, does not raise an error for existing directories. Still raises if the target path is not a directory. Defaults to False.

print_input_spec()#

Print a table of input keys, their title, type, connection requirements and details about their low-level required slots.

The “Required?” heading also shows the number of outputs that must be connected to the input for this job to run.

Examples

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.doc['type']
'extract_micrographs_multi'
>>> job.print_output_spec()
Input       | Title       | Type     | Required? | Input Slots     | Slot Types      | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+)    | micrograph_blob | micrograph_blob | ✓
            |             |          |           | mscope_params   | mscope_params   | ✓
            |             |          |           | background_blob | stat_blob       | ✕
            |             |          |           | ctf             | ctf             | ✕
particles   | Particles   | particle | ✕ (0+)    | location        | location        | ✓
            |             |          |           | alignments2D    | alignments2D    | ✕
            |             |          |           | alignments3D    | alignments3D    | ✕

print_output_spec()#

Print a table of output keys, their title, type and details about their low-level results.

Examples

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.doc['type']
'extract_micrographs_multi'
>>> job.print_output_spec()
Output                 | Title       | Type     | Result Slots           | Result Types
==========================================================================================
micrographs            | Micrographs | exposure | micrograph_blob        | micrograph_blob
                       |             |          | micrograph_blob_non_dw | micrograph_blob
                       |             |          | background_blob        | stat_blob
                       |             |          | ctf                    | ctf
                       |             |          | ctf_stats              | ctf_stats
                       |             |          | mscope_params          | mscope_params
particles              | Particles   | particle | blob                   | blob
                       |             |          | ctf                    | ctf

print_param_spec()#

Print a table of parameter keys, their title, type and default to standard output:

Examples

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.doc['type']
'extract_micrographs_multi'
>>> job.print_param_spec()
Param                       | Title                 | Type    | Default
=======================================================================
box_size_pix                | Extraction box size   | number  | 256
bin_size_pix                | Fourier crop box size | number  | None
compute_num_gpus            | Number of GPUs        | number  | 1
...

queue(lane: str | None = None, hostname: str | None = None, gpus: List[int] = [], cluster_vars: Dict[str, Any] = {})#

Queue a job to a target lane. Available lanes may be queried with CryoSPARC.get_lanes.

Optionally specify a hostname for a node or cluster in the given lane. Optionally specify specific GPUs indexes to use for computation.

Available hostnames for a given lane may be queried with CryoSPARC.get_targets.

Parameters:

lane (str, optional) – Configuried compute lane to queue to. Leave unspecified to run directly on the master or current workstation. Defaults to None.
hostname (str, optional) – Specific hostname in compute lane, if more than one is available. Defaults to None.
gpus (list[int], optional) – GPUs to queue to. If specified, must have as many GPUs as required in job parameters. Leave unspecified to use first available GPU(s). Defaults to [].
cluster_vars (dict[str, Any], optional) – Specify custom cluster variables when queuing to a cluster. Keys are variable names. Defaults to False.

Examples

Queue a job to lane named “worker”:

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.status
"building"
>>> job.queue("worker")
>>> job.status
"queued"

refresh()#

Reload this job from the CryoSPARC database.

Returns:: self
Return type:: Job

set_param(name: str, value: Any, *, refresh: bool = True) → bool#

Set the given param name on the current job to the given value. Only works if the job is in “building” status.

Parameters:

name (str) – Param name, as defined in the job document’s params_base.
value (any) – Target parameter value.
refresh (bool, optional) – Auto-refresh job document after connecting. Defaults to True.

Returns:

False if the job encountered a build error.

Return type:

bool

Examples

Set the number of GPUs used by a supported job

>>> cs = CryoSPARC()
>>> job = cs.find_job("P3", "J42")
>>> job.set_param("compute_num_gpus", 4)
True

property status: Literal['building', 'queued', 'launched', 'started', 'running', 'waiting', 'completed', 'killed', 'failed']#

scheduling status.

Type:: JobStatus

subprocess(args: str | list, mute: bool = False, checkpoint: bool = False, checkpoint_line_pattern: str | Pattern[str] | None = None, **kwargs)#

Launch a subprocess and write its text-based output and error to the job log.

Parameters:

args (str | list) – Process arguments to run
mute (bool, optional) – If True, does not also forward process output to standard output. Defaults to False.
checkpoint (bool, optional) – If True, creates a checkpoint in the job event log just before process output begins. Defaults to False.
checkpoint_line_pattern (str | Pattern[str], optional) – Regular expression to match checkpoint lines for processes with a lot of output. If a process outputs a line that matches this pattern, a checkpoint is created in the event log before this line is forwarded. Defaults to None.
**kwargs – Additional keyword arguments for subprocess.Popen.

Raises:

TypeError – For invalid arguments
RuntimeError – If process exists with non-zero status code

symlink(source_path: str | PurePosixPath, target_path: str | PurePosixPath = '')#

Create a symbolic link in job’s directory.

Parameters:

source_path (str | Path) – Relative or absolute path of source file or folder to create a link to. If relative, assumed to be within the job directory.
target_path (str | Path) – Name or path of new symlink in the job directory. If not specified, creates link with the same file name as the source. Defaults to “”.

property type: str#: Job type key

upload(target_path: str | PurePosixPath, source: str | bytes | PurePath | IO, *, overwrite: bool = False)#

Upload the given file to the job directory at the given path. Fails if target already exists.

Parameters:

target_path (str | Path) – Name or path of file to write in job directory.
source (str | bytes | Path | IO) – Local path or file handle to upload. May also specified as raw bytes.
overwrite (bool, optional) – If True, overwrite existing files. Defaults to False.

Upload an image or text file to the current job. Specify either an image (PNG, JPG, GIF, PDF, SVG), text file (TXT, CSV, JSON, XML) or a binary IO object with data in one of those formats.

If a binary IO object is specified, either a filename or file format must be specified.

Unlike the upload method which saves files to the job directory, this method saves images to the database and exposes them for use in the job log.

If specifying arbitrary binary I/O, specify either a filename or a file format.

Parameters:

file (str | Path | IO) – Source asset file path or handle.
filename (str, optional) – Filename of asset. If file is a handle specify one of filename or format. Defaults to None.
format (AssetFormat, optional) – Format of filename. If file is a handle, specify one of filename or format. Defaults to None.

Raises:

ValueError – If incorrect arguments specified

Returns:

Dictionary including details about uploaded asset.

Return type:

EventLogAsset

upload_dataset(target_path: str | PurePosixPath, dset: Dataset, *, format: int = 1, overwrite: bool = False)#

Upload a dataset as a CS file into the job directory. Fails if target already exists.

Parameters:

target_path (str | Path) – Name or path of dataset to save in the job directory. Should have a .cs extension.
dset (Dataset) – Dataset to save.
format (int) – Format to save in from cryosparc.dataset.*_FORMAT, defaults to NUMPY_FORMAT)
overwrite (bool, optional) – If True, overwrite existing files. Defaults to False.

upload_mrc(target_path: str | PurePosixPath, data: NDArray, psize: float, *, overwrite: bool = False)#

Upload a numpy 2D or 3D array to the job directory as an MRC file. Fails if target already exists.

Parameters:

target_path (str | Path) – Name or path of MRC file to save in the job directory. Should have a .mrc extension.
data (NDArray) – Numpy array with MRC file data.
psize (float) – Pixel size to include in MRC header.
overwrite (bool, optional) – If True, overwrite existing files. Defaults to False.

upload_plot(figure: str | PurePath | IO[bytes] | Any, name: str | None = None, formats: Iterable[Literal['pdf', 'gif', 'jpg', 'jpeg', 'png', 'svg']] = ['png', 'pdf'], raw_data: str | bytes | None = None, raw_data_file: str | PurePath | IO[bytes] | None = None, raw_data_format: Literal['txt', 'csv', 'html', 'json', 'xml', 'bild', 'bld'] | None = None, savefig_kw: dict = {'bbox_inches': 'tight', 'pad_inches': 0}) → List[EventLogAsset]#

Upload the given figure. Returns a list of the created asset objects. Avoid using directly; use log_plot instead. See log_plot additional details.

Parameters:

figure (str | Path | IO | Figure) – Image file path, file handle or matplotlib figure instance
name (str) – Associated name for given figure
formats (list[ImageFormat], optional) – Image formats to save plot into. If a figure is a file handle, specify formats=['<format>'], where <format> is a valid image extension such as png or pdf. Assumes png if not specified. Defaults to [“png”, “pdf”].
raw_data (str | bytes, optional) – Raw text data for associated plot, generally in CSV, XML or JSON format. Cannot be specified with raw_data_file. Defaults to None.
raw_data_file (str | Path | IO, optional) – Path to raw text data. Cannot be specified with raw_data. Defaults to None.
raw_data_format (TextFormat, optional) – Format for raw text data. Defaults to None.
savefig_kw (dict, optional) – If a matplotlib figure is specified optionally specify keyword arguments for the savefig method. Defaults to dict(bbox_inches=”tight”, pad_inches=0).

Raises:

ValueError – If incorrect argument specified

Returns:

Details about created uploaded job assets

Return type:

list[EventLogAsset]

wait_for_done(*, error_on_incomplete: bool = False, timeout: int | None = None) → str#

Wait until a job reaches status “completed”, “killed” or “failed”.

Parameters:

error_on_incomplete (bool, optional) – If True, raises an assertion error when job finishes with status other than “completed” or timeout is reached. Defaults to False.
timeout (int, optional) – If specified, wait at most this many seconds. Once timeout is reached, returns current status or fails if error_on_incomplete is True. Defaults to None.

wait_for_status(status: Literal['building', 'queued', 'launched', 'started', 'running', 'waiting', 'completed', 'killed', 'failed'] | Iterable[Literal['building', 'queued', 'launched', 'started', 'running', 'waiting', 'completed', 'killed', 'failed']], *, timeout: int | None = None) → str#

Wait for a job’s status to reach the specified value. Must be one of the following:

‘building’
‘queued’
‘launched’
‘started’
‘running’
‘waiting’
‘completed’
‘killed’
‘failed’

Parameters:

status (str | set[str]) – Specific status or set of statuses to wait for. If a set of statuses is specified, waits util job reaches any of the specified statuses.
timeout (int, optional) – If specified, wait at most this many seconds. Once timeout is reached, returns current status. Defaults to None.

Returns:

current job status

Return type:

str

cryosparc.job

Contents

cryosparc.job#