job

Contents

job#

Defines the Job and External job classes for accessing CryoSPARC jobs.

Classes:

ExternalJobController

Mutable custom output job with customizeble input slots and output results.

JobController

Accessor class to a job in CryoSPARC with ability to load inputs and outputs, add to job log, download job files.

Data:

GROUP_NAME_PATTERN

Input and output result groups may only contain, letters, numbers and underscores.

class ExternalJobController#

Mutable custom output job with customizeble input slots and output results. Use External jobs to save data save cryo-EM data generated by a software package outside of CryoSPARC.

Created external jobs may be connected to any other CryoSPARC job result as an input. Its outputs must be created manually and may be configured to passthrough inherited input fields, just as with regular CryoSPARC jobs.

Create a new External Job with project.create_external_job(). or workspace.create_external_job(). ExternalJobController is a subclass of JobController and inherits all its methods and attributes.

Examples

Import multiple exposure groups into a single job

>>> from cryosparc.tools import CryoSPARC
>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.create_external_job("W3", title="Import Image Sets")
>>> job.start()
>>> for i in range(3):
...     dset = job.add_output(
...         type="exposure",
...         name=f"images_{i}",
...         slots=["movie_blob", "mscope_params", "gain_ref_blob"],
...         alloc=10  # allocate a dataset for this output with 10 rows
...     )
...     dset['movie_blob/path'] = ...  # populate dataset
...     job.save_output(output_name, dset)
...
>>> job.stop()

Methods:

__init__(cs, job)

add_input(type[, name, min, max, slots, ...])

Add an input slot to the current job.

add_output()

Add an output slot to the current job.

alloc_output(name[, alloc, dtype_params])

Allocate an empty dataset for the given output with the given name.

connect(target_input, source_job_uid, ...[, ...])

Connect the given input for this job to an output with given job UID and name.

kill()

Kill this job.

queue([lane, hostname, gpus, cluster_vars])

Queue a job to a target lane.

run()

Start a job within a context manager and stop the job when the context ends.

save_output(name, dataset, *[, version, ...])

Save output dataset to external job.

set_output_image(name, image, *[, savefig_kw])

Set the output image for the given output to the given image file or matplotlib Figure.

set_tile_image(image, *[, name, savefig_kw])

Set the job tile image to the given image file or matplotlib Figure.

start([status])

Set job status to "running" or "waiting"

stop([error])

Set job status to "completed" or "failed" if there was an error.

__init__(cs: CryoSPARC, job: Tuple[str, str] | Job) None#
add_input(type: Literal['exposure', 'particle', 'template', 'volume', 'volume_multi', 'mask', 'live', 'ml_model', 'symmetry_candidate', 'flex_mesh', 'flex_model', 'hyperparameter', 'denoise_model', 'annotation_model'], name: str | None = None, min: int = 0, max: int | Literal['inf'] = 'inf', slots: Sequence[str | Slot | Datafield] = [], title: str | None = None, desc: str | None = None)#

Add an input slot to the current job. May be connected to zero or more outputs from other jobs (depending on the min and max values).

Parameters:
  • type (Datatype) – cryo-EM data type for this output, e.g., “particle”

  • name (str, optional) – Output name key, e.g., “picked_particles”. Same as type if not specified. Defaults to None.

  • min (int, optional) – Minimum number of required input connections. Defaults to 0.

  • max (int | Literal["inf"], optional) – Maximum number of input connections. Specify "inf" for unlimited connections. Defaults to “inf”.

  • slots (list[SlotSpec], optional) – List of slots that should be connected to this input, such as "location" or "blob". When connecting the input, if the source job output is missing these slots, the external job cannot start or accept outputs. Defaults to [].

  • title (str, optional) – Human-readable title for this input. Defaults to name.

  • desc (str, optional) – Human-readable description for this input. Defaults to None.

Raises:
  • CommandError – General CryoSPARC network access error such as timeout, URL or HTTP

  • InvalidSlotsError – slots argument is invalid

Returns:

name of created input

Return type:

str

Examples

Create an external job that accepts micrographs as input:

>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.create_external_job("W1", title="Custom Picker")
>>> job.uid
"J3"
>>> job.add_input(
...     type="exposure",
...     name="input_micrographs",
...     min=1,
...     slots=["micrograph_blob", "ctf"],
...     title="Input micrographs for picking
... )
"input_micrographs"
add_output(type: Literal['exposure', 'particle', 'template', 'volume', 'volume_multi', 'mask', 'live', 'ml_model', 'symmetry_candidate', 'flex_mesh', 'flex_model', 'hyperparameter', 'denoise_model', 'annotation_model'], name: str | None = None, slots: Sequence[str | Slot | Datafield] = [], passthrough: str | None = None, title: str | None = None) str#
add_output(type: Literal['exposure', 'particle', 'template', 'volume', 'volume_multi', 'mask', 'live', 'ml_model', 'symmetry_candidate', 'flex_mesh', 'flex_model', 'hyperparameter', 'denoise_model', 'annotation_model'], name: str | None = None, slots: Sequence[str | Slot | Datafield] = [], passthrough: str | None = None, title: str | None = None, *, alloc: int | Dataset) Dataset

Add an output slot to the current job. Optionally returns the corresponding empty dataset if alloc is specified.

Parameters:
  • type (Datatype) – cryo-EM datatype for this output, e.g., “particle”

  • name (str, optional) – Output name key, e.g., “selected_particles”. Same as type if not specified. Defaults to None.

  • slots (list[SlotSpec], optional) – List of slot expected to be created for this output, such as location or blob. Do not specify any slots that were passed through from an input unless those slots are modified in the output. Defaults to [].

  • passthrough (str, optional) – Indicates that this output inherits slots from an existing input with the specified name. The input must first be added with add_input(). Defaults to False.

  • title (str, optional) – Human-readable title for this input. Defaults to None.

  • alloc (int | Dataset, optional) – If specified, pre-allocate and return a dataset with the requested slots. Specify an integer to allocate a specific number of rows. Specify a Dataset from which to inherit unique row IDs (useful when adding passthrough outputs). Defaults to None.

Raises:
  • CommandError – General CryoSPARC network access error such as timeout, URL or HTTP

  • InvalidSlotsError – slots argument is invalid

Returns:

Name of the created output. If alloc is

specified as an integer, instead returns blank dataset with the given size and random UIDs. If alloc is specified as a Dataset, returns blank dataset with the same UIDs.

Return type:

str | Dataset

Examples

Create and allocate an output for new particle picks

>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> particles_dset = job.add_output(
...     type="particle",
...     name="picked_particles",
...     slots=["location", "pick_stats"],
...     alloc=10000
... )

Create an inheritied output for input micrographs

>>> job.add_output(
...     type="exposures",
...     name="picked_micrographs",
...     passthrough="input_micrographs",
...     title="Passthrough picked micrographs"
... )
"picked_micrographs"

Create an output with multiple slots of the same type

>>> job.add_output(
...     type="particle",
...     name="particle_alignments",
...     slots=[
...         {"name": "alignments_class_0", "dtype": "alignments3D", "required": True},
...         {"name": "alignments_class_1", "dtype": "alignments3D", "required": True},
...         {"name": "alignments_class_2", "dtype": "alignments3D", "required": True},
...     ]
... )
"particle_alignments"
alloc_output(name: str, alloc: int | ArrayLike | Dataset = 0, *, dtype_params: Dict[str, Any] = {}) Dataset#

Allocate an empty dataset for the given output with the given name. Initialize with the given number of empty rows. The result may be used with save_output with the same output name.

Parameters:
  • name (str) – Name of job output to allocate

  • size (int | ArrayLike | Dataset, optional) – Specify as one of the following: (A) integer to allocate a specific number of rows, (B) a numpy array of numbers to use for UIDs in the allocated dataset or (C) a dataset from which to inherit unique row IDs (useful for allocating passthrough outputs). Defaults to 0.

  • dtype_params (dict, optional) – Data type parameters when allocating results with dynamic column sizes such as particle -> alignments3D_multi. Defaults to {}.

Returns:

Empty dataset with the given number of rows

Return type:

Dataset

Examples

Allocate a dataset of size 10,000 for an output for new particle picks

>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> job.alloc_output("picked_particles", 10000)
Dataset([  # 10000 items, 11 fields
    ("uid": [...]),
    ("location/micrograph_path", ["", ...]),
    ...
])

Allocate a dataset from an existing input passthrough dataset

>>> input_micrographs = job.load_input("input_micrographs")
>>> job.alloc_output("picked_micrographs", input_micrographs)
Dataset([  # same "uid" field as input_micrographs
    ("uid": [...]),
])
connect(target_input: str, source_job_uid: str, source_output: str, *, slots: Sequence[str | Slot | Datafield] = [], title: str | None = None, desc: str | None = None, **kwargs) bool#

Connect the given input for this job to an output with given job UID and name. If this input does not exist, it will be added with the given slots.

Parameters:
  • target_input (str) – Input name to connect into. Will be created if does not already exist.

  • source_job_uid (str) – Job UID to connect from, e.g., “J42”

  • source_output (str) – Job output name to connect from , e.g., “particles”

  • slots (list[SlotSpec], optional) – List of input slots (e.g., “particle” or “blob”) to explicitly required for the created input. If the given source job is missing these slots, the external job cannot start or accept outputs. Defaults to [].

  • title (str, optional) – Human readable title for created input. Defaults to target input name.

  • desc (str, optional) – Human readable description for created input. Defaults to “”.

Raises:
  • CommandError – General CryoSPARC network access error such as timeout, URL or HTTP

  • InvalidSlotsError – slots argument is invalid

Examples

Connect J3 to CTF-corrected micrographs from J2’s micrographs output.

>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> job.connect("input_micrographs", "J2", "micrographs")
kill()#

Kill this job.

queue(lane: str | None = None, hostname: str | None = None, gpus: Sequence[int] = [], cluster_vars: Dict[str, Any] = {})#

Queue a job to a target lane. Available lanes may be queried with :py:meth:`cs.get_lanes() <cryosparc.tools.CryoSPARC.get_lanes>.

Optionally specify a hostname for a node or cluster in the given lane. Optionally specify specific GPUs indexes to use for computation.

Available hostnames for a given lane may be queried with :py:meth:`cs.get_targets() <cryosparc.tools.CryoSPARC.get_targets>.

Parameters:
  • lane (str, optional) – Configuried compute lane to queue to. Leave unspecified to run directly on the master or current workstation. Defaults to None.

  • hostname (str, optional) – Specific hostname in compute lane, if more than one is available. Defaults to None.

  • gpus (list[int], optional) – GPUs to queue to. If specified, must have as many GPUs as required in job parameters. Leave unspecified to use first available GPU(s). Defaults to [].

  • cluster_vars (dict[str, Any], optional) – Specify custom cluster variables when queuing to a cluster. Keys are variable names. Defaults to False.

Examples

Queue a job to lane named “worker”:

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.status
"building"
>>> job.queue("worker")
>>> job.status
"queued"
run()#

Start a job within a context manager and stop the job when the context ends.

Yields:

ExternalJob – self.

Examples

Job will be marked as “failed” if the contents of the block throw an exception

>>> with job.run():
...     job.save_output(...)
save_output(name: str, dataset: Dataset, *, version: int = 0, image: str | PurePath | IO[bytes] | Any | None = None, savefig_kw: dict = {'bbox_inches': 'tight', 'pad_inches': 0}, **kwargs)#

Save output dataset to external job.

Parameters:
  • name (str) – Name of output on this job.

  • dataset (Dataset) – Value of output with only required fields.

  • version (int, optional) – Version number, when saving multiple intermediate iterations. Only the last saved version is kept. Defaults to 0.

  • image (str | Path | IO | Figure, optional) – Optional image file or matplotlib Figure to set as the thumbnail for this output. Defaults to None.

  • savefig_kw (dict, optional) – Additional keyword arguments to pass to figure.savefig() when saving matplotlib Figures. Defaults to dict(bbox_inches="tight", pad_inches=0).

Examples

Save a previously-allocated output.

>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.find_external_job("J3")
>>> particles = job.alloc_output("picked_particles", 10000)
>>> job.save_output("picked_particles", particles)
set_output_image(name: str, image: str | PurePath | IO[bytes] | Any | GridFSAsset, *, savefig_kw: dict = {'bbox_inches': 'tight', 'pad_inches': 0})#

Set the output image for the given output to the given image file or matplotlib Figure. :param name: Name of output to set image for. :type name: str :param image: Image file or matplotlib Figure. :type image: str | Path | IO | Figure

set_tile_image(image: str | PurePath | IO[bytes] | Any | GridFSAsset, *, name: str = 'tile0', savefig_kw: dict = {'bbox_inches': 'tight', 'pad_inches': 0})#

Set the job tile image to the given image file or matplotlib Figure.

Parameters:

image (str | Path | IO | Figure) – Image file or matplotlib Figure.

start(status: Literal['running', 'waiting'] = 'waiting')#

Set job status to “running” or “waiting”

Parameters:

status (str, optional) – “running” or “waiting”. Defaults to “waiting”.

stop(error: str = '')#

Set job status to “completed” or “failed” if there was an error.

Parameters:

error (str, optional) – Error message, will add to event log and set job to status to failed if specified. Defaults to “”.

FileOrFigure(*args, **kwargs)#

A file path, a file-like object, or a matplotlib figure.

alias of str | PurePath | IO[bytes] | Any

GROUP_NAME_PATTERN = '^[A-Za-z][0-9A-Za-z_]*$'#

Input and output result groups may only contain, letters, numbers and underscores.

class JobController#

Accessor class to a job in CryoSPARC with ability to load inputs and outputs, add to job log, download job files. Should be created with cs.find_job() or project.find_job().

Parameters:

job (tuple[str, str] | Job) – either _(Project UID, Job UID)_ tuple or Job model, e.g. ("P3", "J42")

model#

All job data from the CryoSPARC database. Contents may change over time, use refresh() to update.

Type:

Job

Examples

Find an existing job.

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.status
"building"

Queue a job.

>>> job.queue("worker_lane")
>>> job.status
"queued"

Create a 3-class ab-initio job connected to existing particles.

>>> job = cs.create_job("P3", "W1", "homo_abinit"
...     connections={"particles": ("J20", "particles_selected")}
...     params={"abinit_K": 3}
... )
>>> job.queue()
>>> job.status
"queued"

Attributes:

desc

Job description

dir

Full path to the job directory.

full_spec

The full specification for job inputs, outputs and parameters, as defined in the job register.

inputs

Input connection details.

outputs

Input result details.

params

Job parameter values object.

project_uid

Project unique ID, e.g., "P3"

status

Job scheduling status.

title

Job title

type

Job type key

uid

Job unique ID, e.g., "J42"

Methods:

__init__(cs, job)

clear()

Clear this job and reset to building status.

connect(target_input, source_job_uid, ...)

Connect the given input for this job to an output with given job UID and name.

connect_result(target_input, connection_idx, ...)

Connect a low-level input result slot with a result from another job.

cp(source_path[, target_path])

Copy a file or folder into the job directory.

disconnect(target_input[, connection_idx])

Clear the given job input group.

disconnect_result(target_input, ...)

Clear the job's given input result slot.

download(path)

Initiate a download request for a file inside the job's directory.

download_asset(fileid, target)

Download a job asset from the database with the given ID.

download_dataset(path)

Download a .cs dataset file from the given path in the job directory.

download_file(path[, target])

Download file from job directory to the given target path or writeable file handle.

download_mrc(path)

Download a .mrc file from the given relative path in the job directory.

interact(action[, body, timeout, refresh])

Call an interactive action on a waiting interactive job.

kill()

Kill this job.

list_assets()

Get a list of files available in the database for this job.

list_files([prefix, recursive])

Get a list of files inside the job directory.

load_input(name[, slots])

Load the dataset connected to the job's input with the given name.

load_output(name[, slots, version])

Load the dataset for the job's output with the given name.

log()

Append to a job's event log.

log_checkpoint([meta])

Append a checkpoint to the job's event log.

log_plot(figure, text[, formats, raw_data, ...])

Add a log line with the given figure.

mkdir(target_path[, parents, exist_ok])

Create a folder in the given job.

print_input_spec()

Print a table of input keys, their title, type, connection requirements and details about their low-level required slots.

print_output_spec()

Print a table of output keys, their title, type and details about their low-level results.

print_param_spec()

Print a table of parameter keys, their title, type and default to standard output:

queue([lane, hostname, gpus, cluster_vars])

Queue a job to a target lane.

refresh()

Reload this job from the CryoSPARC database.

set_description(desc)

Set the job description.

set_param(name, value, **kwargs)

Set the given param name on the current job to the given value.

set_title(title)

Set the job title.

subprocess(args[, mute, checkpoint, ...])

Launch a subprocess and write its text-based output and error to the job log.

symlink(source_path[, target_path])

Create a symbolic link in job's directory.

upload(target_path, source, *[, overwrite])

Upload the given file to the job directory at the given path.

upload_dataset(target_path, dset, *[, ...])

Upload a dataset as a CS file into the job directory.

upload_mrc(target_path, data, psize, *[, ...])

Upload a numpy 2D or 3D array to the job directory as an MRC file.

wait_for_done(*[, error_on_incomplete, timeout])

Wait until a job reaches status "completed", "killed" or "failed".

wait_for_status(status, *[, timeout])

Wait for a job's status to reach the specified value.

__init__(cs: CryoSPARC, job: Tuple[str, str] | Job) None#
clear()#

Clear this job and reset to building status.

connect(target_input: str, source_job_uid: str, source_output: str, **kwargs) bool#

Connect the given input for this job to an output with given job UID and name.

Parameters:
  • target_input (str) – Input name to connect into. Will be created if not specified.

  • source_job_uid (str) – Job UID to connect from, e.g., “J42”

  • source_output (str) – Job output name to connect from , e.g., “particles”

Returns:

False if the job encountered a build error.

Return type:

bool

Examples

Connect J3 to CTF-corrected micrographs from J2’s micrographs output.

>>> cs = CryoSPARC("http://localhost:61000")
>>> project = cs.find_project("P3")
>>> job = project.find_job("J3")
>>> job.connect("input_micrographs", "J2", "micrographs")
connect_result(target_input: str, connection_idx: int, slot: str, source_job_uid: str, source_output: str, source_result: str, source_version: int | Literal['F'] = 'F')#

Connect a low-level input result slot with a result from another job.

Parameters:
  • target_input (str) – Input name to connect into, e.g., “particles”

  • connection_idx (int) – Connection index to connect into, use 0 for the job’s first connection on that input, 1 for the second, etc.

  • slot (str) – Input slot name to connect into, e.g., “location”

  • source_job_uid (str) – Job UID to connect from, e.g., “J42”

  • source_output (str) – Job output name to connect from , e.g., “particles_selected”

  • source_result (str) – Result name to connect from, e.g., “location”

Returns:

False if the job encountered a build error.

Return type:

bool

Examples

Connect J3 to the first connection of J2’s particles input. >>> cs = CryoSPARC(”http://localhost:61000”) >>> project = cs.find_project(“P3”) >>> job = project.find_job(“J3”) >>> job.connect_result(“particles”, 0, “location”, “J2”, “particles_selected”, “location”)

cp(source_path: str | PurePosixPath, target_path: str | PurePosixPath = '')#

Copy a file or folder into the job directory.

Parameters:
  • source_path (str | Path) – Relative or absolute path of source file or folder to copy. If relative, assumed to be within the job directory.

  • target_path (str | Path, optional) – Name or path in the job directory to copy into. If not specified, uses the same file name as the source. Defaults to “”.

property desc: str#

Job description

property dir: PurePosixPath#

Full path to the job directory.

disconnect(target_input: str, connection_idx: int | None = None, **kwargs)#

Clear the given job input group.

Parameters:
  • target_input (str) – Name of input to disconnect

  • connection_idx (int, optional) – Connection index to clear. Set to 0 to clear the first connection, 1 for the second, etc. If unspecified, clears all connections. Defaults to None.

disconnect_result(target_input: str, connection_idx: int, slot: str)#

Clear the job’s given input result slot.

Parameters:
  • target_input (str) – Name of input to disconnect

  • connection_idx (int) – Connection index to modify. Set to 0 for the first connection, 1 for the second, etc.

  • slot (str) – Input slot name to disconnect, e.g., “location”

Returns:

False if the job encountered a build error.

Return type:

bool

download(path: str | PurePosixPath)#

Initiate a download request for a file inside the job’s directory. Use to get files from a remote CryoSPARC instance where the job directory is not available on the client file system.

Parameters:

path (str | Path) – Name or path of file in job directory.

Yields:

HTTPResponse

Use a context manager to read the file from the

request body.

Examples

Download a job’s metadata

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> with job.download("job.json") as res:
>>>     job_data = json.loads(res.read())
download_asset(fileid: str, target: str | PurePath | IO[bytes])#

Download a job asset from the database with the given ID. Note that the file does not necessary have to belong to the current job.

Parameters:
  • fileid (str) – GridFS file object ID

  • target (str | Path | IO) – Local file path or writeable file handle to write response data.

Returns:

resulting target path or file handle.

Return type:

str | Path | IO

download_dataset(path: str | PurePosixPath)#

Download a .cs dataset file from the given path in the job directory.

Parameters:

path (str | Path) – Name or path of .cs file in job directory.

Returns:

Loaded dataset instance

Return type:

Dataset

download_file(path: str | PurePosixPath, target: str | PurePath | IO[bytes] = '')#

Download file from job directory to the given target path or writeable file handle.

Parameters:
  • path (str | Path) – Name or path of file in job directory.

  • target (str | Path | IO) – Local file path, directory path or writeable file handle to write response data. If not specified, downloads to current working directory with same file name. Defaults to “”.

Returns:

resulting target path or file handle.

Return type:

Path | IO

download_mrc(path: str | PurePosixPath)#

Download a .mrc file from the given relative path in the job directory.

Parameters:

path (str | Path) – Name or path of .mrc file in job directory.

Returns:

MRC file header and data as a numpy array

Return type:

tuple[Header, NDArray]

property full_spec: JobRegisterJobSpec#

The full specification for job inputs, outputs and parameters, as defined in the job register.

property inputs: Dict[str, Input]#

Input connection details.

interact(action: str, body: Any = {}, *, timeout: int = 10, refresh: bool = False) Any#

Call an interactive action on a waiting interactive job. The possible actions and expected body depends on the job type.

Parameters:
  • action (str) – Interactive endpoint to call.

  • body (any) – Body parameters for the interactive endpoint. Must be JSON-encodable.

  • timeout (int, optional) – Maximum time to wait for the action to complete, in seconds. Defaults to 10.

  • refresh (bool, optional) – If True, refresh the job document after posting. Defaults to False.

kill()#

Kill this job.

list_assets() List[GridFSFile]#

Get a list of files available in the database for this job. Returns a list with details about the assets. Each entry is a dict with a _id key which may be used to download the file with the download_asset method.

Returns:

Asset details

Return type:

list[GridFSFile]

list_files(prefix: str | PurePosixPath = '', recursive: bool = False) List[str]#

Get a list of files inside the job directory.

Parameters:
  • prefix (str | Path, optional) – Subdirectory inside job to list. Defaults to “”.

  • recursive (bool, optional) – If True, lists files recursively. Defaults to False.

Returns:

List of file paths relative to the job directory.

Return type:

list[str]

load_input(name: str, slots: Literal['default', 'passthrough', 'all'] | List[str] = 'all')#

Load the dataset connected to the job’s input with the given name.

Parameters:
  • name (str) – Input to load

  • slots (Literal["default", "passthrough", "all"] | list[str], optional) – List of specific slots to load, such as movie_blob or locations, or all slots if not specified (including passthrough). May also specify as keyword. Defaults to “all”.

Raises:

TypeError – If the job doesn’t have the given input or the dataset cannot be loaded.

Returns:

Loaded dataset

Return type:

Dataset

load_output(name: str, slots: Literal['default', 'passthrough', 'all'] | List[str] = 'all', version: int | Literal['F'] = 'F')#

Load the dataset for the job’s output with the given name.

Parameters:
  • name (str) – Output to load

  • slots (Literal["default", "passthrough", "all"] | list[str], optional) – List of specific slots to load, such as movie_blob or locations, or all slots if not specified (including passthrough). May also specify as keyword. Defaults to “all”.

  • version (int | Literal["F"], optional) – Specific output version to load. Use this to load the output at different stages of processing. Leave unspecified to load final verion. Defaults to “F”

Raises:

TypeError – If job does not have any results for the given output

Returns:

Loaded dataset

Return type:

Dataset

log(text: str, *, level: Literal['text', 'warning', 'error'] = 'text') str#
log(text: str, *, level: Literal['text', 'warning', 'error'] = 'text', name: str) str
log(text: str, *, level: Literal['text', 'warning', 'error'] = 'text', id: str) str

Append to a job’s event log. Update an existing log by providing a name or ID.

Parameters:
  • text (str) – Text to log

  • level (str, optional) – Log level (“text”, “warning” or “error”). Defaults to “text”.

  • name (str, optional) – Event name. If called multiple times with the same name, updates that event instead of creating a new one. Named events are reset when logging a checkpoint. Cannot be provided with id. Defaults to None.

  • id (str, optional) – Update a previously-created event log by its ID. Cannot be provided with name. Defaults to None.

Example

Log a warning message to the job log. >>> job.log(“This is a warning”, level=”warning”)

Show a live progress bar in the job log. >>> for pct in range(1, 10): … # example log: “Progress: [#####—–] 50%” … job.log(f”Progress: [{‘#’ * pct}{‘-’ * (10 - pct)}] {pct * 10}%”, name=”progress”) … sleep(1) … >>> job.log(“Done!”)

Update an existing log event by ID. >>> event_id = job.log(“Starting job processing…”) >>> # do some processing… >>> job.log(“Finished processing”, id=event_id)

Returns:

Created log event ID

Return type:

str

log_checkpoint(meta: dict = {})#

Append a checkpoint to the job’s event log. Also resets named events.

Parameters:

meta (dict, optional) – Additional meta information. Defaults to {}.

Returns:

Created checkpoint event ID

Return type:

str

log_plot(figure: str | PurePath | IO[bytes] | Any, text: str, formats: Iterable[Literal['pdf', 'gif', 'jpg', 'jpeg', 'png', 'svg']] = ['png', 'pdf'], raw_data: str | bytes | None = None, raw_data_file: str | PurePath | IO[bytes] | None = None, raw_data_format: Literal['txt', 'csv', 'html', 'json', 'xml', 'bild', 'bld', 'log'] | None = None, flags: List[str] = ['plots'], savefig_kw: dict = {'bbox_inches': 'tight', 'pad_inches': 0})#

Add a log line with the given figure.

figure must be one of the following

  • Path to an existing image file in PNG, JPEG, GIF, SVG or PDF format

  • A file handle-like object with the binary data of an image

  • A matplotlib plot

If a matplotlib figure is specified, Uploads the plots in png and pdf formats. Override the formats argument with formats=['<format1>', '<format2>', ...] to save in different image formats.

If a text-version of the given plot is available (e.g., in csv format), specify raw_data with the full contents or raw_data_file with a path or binary file handle pointing to the contents. Assumes file format from extension or raw_data_format. Defaults to "txt" if cannot be determined.

Parameters:
  • figure (str | Path | IO | Figure) – Image file path, file handle or matplotlib figure instance

  • text (str) – Associated description for given figure

  • formats (list[ImageFormat], optional) – Image formats to save plot into. If a figure is a file handle, specify formats=['<format>'], where <format> is a valid image extension such as png or pdf. Assumes png if not specified. Defaults to [“png”, “pdf”].

  • raw_data (str | bytes, optional) – Raw text data for associated plot, generally in CSV, XML or JSON format. Cannot be specified with raw_data_file. Defaults to None.

  • raw_data_file (str | Path | IO, optional) – Path to raw text data. Cannot be specified with raw_data. Defaults to None.

  • raw_data_format (TextFormat, optional) – Format for raw text data. Defaults to None.

  • flags (list[str], optional) – Flags to use for UI rendering. Generally should not be specified. Defaults to [“plots”].

  • savefig_kw (dict, optional) – If a matplotlib figure is specified optionally specify keyword arguments for the savefig method. Defaults to dict(bbox_inches=”tight”, pad_inches=0).

Returns:

Created log event ID

Return type:

str

mkdir(target_path: str | PurePosixPath, parents: bool = False, exist_ok: bool = False)#

Create a folder in the given job.

Parameters:
  • target_path (str | Path) – Name or path of folder to create inside the job directory.

  • parents (bool, optional) – If True, any missing parents are created as needed. Defaults to False.

  • exist_ok (bool, optional) – If True, does not raise an error for existing directories. Still raises if the target path is not a directory. Defaults to False.

property outputs: Dict[str, Output]#

Input result details.

property params: Params#

Job parameter values object.

Example

>>> cs = CryoSPARC(...)
>>> job = cs.find_job("P3", "J42")
>>> print(job.type)
"homo_abinit"
>>> print(job.params.abinit_K)
3
print_input_spec()#

Print a table of input keys, their title, type, connection requirements and details about their low-level required slots.

The “Required?” heading also shows the number of outputs that must be connected to the input for this job to run.

Examples

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.type
'extract_micrographs_multi'
>>> job.print_output_spec()
Input       | Title       | Type     | Required? | Input Slots     | Slot Types      | Slot Required?
=====================================================================================================
micrographs | Micrographs | exposure | ✓ (1+)    | micrograph_blob | micrograph_blob | ✓
            |             |          |           | mscope_params   | mscope_params   | ✓
            |             |          |           | background_blob | stat_blob       | ✕
            |             |          |           | ctf             | ctf             | ✕
particles   | Particles   | particle | ✕ (0+)    | location        | location        | ✓
            |             |          |           | alignments2D    | alignments2D    | ✕
            |             |          |           | alignments3D    | alignments3D    | ✕
print_output_spec()#

Print a table of output keys, their title, type and details about their low-level results.

Examples

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.type
'extract_micrographs_multi'
>>> job.print_output_spec()
Output                 | Title       | Type     | Result Slots           | Result Types    | Passthrough?
=========================================================================================================
micrographs            | Micrographs | exposure | micrograph_blob        | micrograph_blob | ✕
                       |             |          | micrograph_blob_non_dw | micrograph_blob | ✓
                       |             |          | background_blob        | stat_blob       | ✓
                       |             |          | ctf                    | ctf             | ✓
                       |             |          | ctf_stats              | ctf_stats       | ✓
                       |             |          | mscope_params          | mscope_params   | ✓
particles              | Particles   | particle | blob                   | blob            | ✕
                       |             |          | ctf                    | ctf             | ✕
print_param_spec()#

Print a table of parameter keys, their title, type and default to standard output:

Examples

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.type
'extract_micrographs_multi'
>>> job.print_param_spec()
Param                       | Title                 | Type    | Default
=======================================================================
box_size_pix                | Extraction box size   | integer  | 256
bin_size_pix                | Fourier crop box size | integer  | None
compute_num_gpus            | Number of GPUs        | integer  | 1
...
project_uid: str#

Project unique ID, e.g., “P3”

queue(lane: str | None = None, hostname: str | None = None, gpus: List[int] = [], cluster_vars: Dict[str, Any] = {})#

Queue a job to a target lane. Available lanes may be queried with :py:meth:`cs.get_lanes() <cryosparc.tools.CryoSPARC.get_lanes>.

Optionally specify a hostname for a node or cluster in the given lane. Optionally specify specific GPUs indexes to use for computation.

Available hostnames for a given lane may be queried with :py:meth:`cs.get_targets() <cryosparc.tools.CryoSPARC.get_targets>.

Parameters:
  • lane (str, optional) – Configuried compute lane to queue to. Leave unspecified to run directly on the master or current workstation. Defaults to None.

  • hostname (str, optional) – Specific hostname in compute lane, if more than one is available. Defaults to None.

  • gpus (list[int], optional) – GPUs to queue to. If specified, must have as many GPUs as required in job parameters. Leave unspecified to use first available GPU(s). Defaults to [].

  • cluster_vars (dict[str, Any], optional) – Specify custom cluster variables when queuing to a cluster. Keys are variable names. Defaults to False.

Examples

Queue a job to lane named “worker”:

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.status
"building"
>>> job.queue("worker")
>>> job.status
"queued"
refresh()#

Reload this job from the CryoSPARC database.

Returns:

self

Return type:

JobController

set_description(desc: str)#

Set the job description. May include Markdown formatting.

Parameters:

desc (str) – New job description

set_param(name: str, value: Any, **kwargs) bool#

Set the given param name on the current job to the given value. Only works if the job is in “building” status.

Parameters:
  • name (str) – Param name, as defined in the job document’s params_base.

  • value (any) – Target parameter value.

Returns:

False if the job encountered a build error.

Return type:

bool

Examples

Set the number of GPUs used by a supported job

>>> cs = CryoSPARC("http://localhost:61000")
>>> job = cs.find_job("P3", "J42")
>>> job.set_param("compute_num_gpus", 4)
True
set_title(title: str)#

Set the job title.

Parameters:

title (str) – New job title

property status: Literal['building', 'queued', 'launched', 'started', 'running', 'waiting', 'completed', 'killed', 'failed']#

Job scheduling status.

subprocess(args: str | list, mute: bool = False, checkpoint: bool = False, checkpoint_line_pattern: str | Pattern[str] | None = None, **kwargs)#

Launch a subprocess and write its text-based output and error to the job log.

Parameters:
  • args (str | list) – Process arguments to run

  • mute (bool, optional) – If True, does not also forward process output to standard output. Defaults to False.

  • checkpoint (bool, optional) – If True, creates a checkpoint in the job event log just before process output begins. Defaults to False.

  • checkpoint_line_pattern (str | Pattern[str], optional) – Regular expression to match checkpoint lines for processes with a lot of output. If a process outputs a line that matches this pattern, a checkpoint is created in the event log before this line is forwarded. Defaults to None.

  • **kwargs – Additional keyword arguments for subprocess.Popen.

Raises:
  • TypeError – For invalid arguments

  • RuntimeError – If process exists with non-zero status code

Create a symbolic link in job’s directory.

Parameters:
  • source_path (str | Path) – Relative or absolute path of source file or folder to create a link to. If relative, assumed to be within the job directory.

  • target_path (str | Path) – Name or path of new symlink in the job directory. If not specified, creates link with the same file name as the source. Defaults to “”.

property title: str#

Job title

property type: str#

Job type key

uid: str#

Job unique ID, e.g., “J42”

upload(target_path: str | PurePosixPath, source: str | bytes | PurePath | IO, *, overwrite: bool = False)#

Upload the given file to the job directory at the given path. Fails if target already exists.

Parameters:
  • target_path (str | Path) – Name or path of file to write in job directory.

  • source (str | bytes | Path | IO) – Local path or file handle to upload. May also specified as raw bytes.

  • overwrite (bool, optional) – If True, overwrite existing files. Defaults to False.

upload_dataset(target_path: str | PurePosixPath, dset: Dataset, *, format: int = 1, overwrite: bool = False)#

Upload a dataset as a CS file into the job directory. Fails if target already exists.

Parameters:
  • target_path (str | Path) – Name or path of dataset to save in the job directory. Should have a .cs extension.

  • dset (Dataset) – Dataset to save.

  • format (int) – Format to save in from cryosparc.dataset.*_FORMAT, defaults to NUMPY_FORMAT)

  • overwrite (bool, optional) – If True, overwrite existing files. Defaults to False.

upload_mrc(target_path: str | PurePosixPath, data: NDArray, psize: float, *, overwrite: bool = False)#

Upload a numpy 2D or 3D array to the job directory as an MRC file. Fails if target already exists.

Parameters:
  • target_path (str | Path) – Name or path of MRC file to save in the job directory. Should have a .mrc extension.

  • data (NDArray) – Numpy array with MRC file data.

  • psize (float) – Pixel size to include in MRC header.

  • overwrite (bool, optional) – If True, overwrite existing files. Defaults to False.

wait_for_done(*, error_on_incomplete: bool = False, timeout: int | None = None) str#

Wait until a job reaches status “completed”, “killed” or “failed”.

Parameters:
  • error_on_incomplete (bool, optional) – If True, raises an assertion error when job finishes with status other than “completed” or timeout is reached. Defaults to False.

  • timeout (int, optional) – If specified, wait at most this many seconds. Once timeout is reached, returns current status or fails if error_on_incomplete is True. Defaults to None.

wait_for_status(status: Literal['building', 'queued', 'launched', 'started', 'running', 'waiting', 'completed', 'killed', 'failed'] | Iterable[Literal['building', 'queued', 'launched', 'started', 'running', 'waiting', 'completed', 'killed', 'failed']], *, timeout: int | None = None) str#

Wait for a job’s status to reach the specified value. Must be one of the following:

  • ‘building’

  • ‘queued’

  • ‘launched’

  • ‘started’

  • ‘running’

  • ‘waiting’

  • ‘completed’

  • ‘killed’

  • ‘failed’

Parameters:
  • status (str | set[str]) – Specific status or set of statuses to wait for. If a set of statuses is specified, waits util job reaches any of the specified statuses.

  • timeout (int, optional) – If specified, wait at most this many seconds. Once timeout is reached, returns current status. Defaults to None.

Returns:

current job status

Return type:

str

LogLevel(*args, **kwargs)#

Severity level for job event logs.

alias of Literal[‘text’, ‘warning’, ‘error’]