cryosparc.dataset#
Classes and utilities for loading, saving and working with .cs Dataset files. A pure-C interface to dataset handles is also available.
A Dataset is everything: particles, volumes, micrographs, etc.
A Result is a dataset + field names + other info.
Datasets are lightweight: multiple can be used at any time, like one per micrograph in picking
The only required field is uid
. This field is automatically added to every
new dataset.
Datasets are created in on the following ways:
- allocated empty with a specific size and field definitions
- from a previous dataset source that already has uids (file, record array)
- by appending datasets to each other or joining on uid
Dataset supports: - adding new rows (via appending) - adding new fields - joining fields from another dataset on UID
Data:
Compressed stream .cs file format. |
|
Default save .cs file format. |
|
Newest save .cs file format. |
|
Numpy-array .cs file format. |
Classes:
|
Accessor class for working with CryoSPARC .cs files. |
Functions:
|
Generate the given number of random 64-bit unsigned integer uids. |
- cryosparc.dataset.CSDAT_FORMAT = 2#
Compressed stream .cs file format. Same as
NEWEST_FORMAT
.
- cryosparc.dataset.DEFAULT_FORMAT = 1#
Default save .cs file format. Same as
NUMPY_FORMAT
.
- class cryosparc.dataset.Dataset(allocate: int | Dataset[Any] | NDArray | ~cryosparc.core.Data | ~typing.Mapping[str, ArrayLike] | ~typing.List[~typing.Tuple[str, ArrayLike]] | None = 0, row_class=<class 'cryosparc.row.Row'>)#
Accessor class for working with CryoSPARC .cs files.
A dataset may be initialized with
Dataset(data)
wheredata
is one of the following:A size of items to allocate (e.g., 42)
A mapping from column names to their contents (dict or tuple list)
A numpy record array
- Parameters:
Examples
Initialize a dataset
>>> dset = Dataset([ ... ("uid", [1, 2, 3]), ... ("dat1", ["Hello", "World", "!"]), ... ("dat2", [3.14, 2.71, 1.61]) ... ]) >>> dset.descr() [('uid', '<u8'), ('dat1', '|O'), ('dat2', '<f8')]
Load a dataset from disk
>>> from cryosparc.dataset import Dataset >>> dset = Dataset.load('/path/to/particles.cs') >>> for particle in dset.rows(): ... print( ... f"Particle located in file {particle['blob/path']} " ... f"at index {particle['blob/idx']}")
Methods:
Adds the given fields to the dataset.
allocate
([size, fields])Allocate a dataset with the given number of rows and specified fields.
append
(*others[, assert_same_fields, ...])Concatenate many datasets together into one new one.
append_many
(*datasets[, assert_same_fields, ...])Similar to
Dataset.append
.cols
()Get current dataset columns, organized by field.
common_fields
(*datasets[, assert_same_fields])Get a list of fields common to all given datasets.
copy
()Create a deep copy of the current dataset.
copy_fields
(old_fields, new_fields)Copy the values at the given old fields into the new fields, allocating them if necessary.
descr
([exclude_uid])Get numpy-compatible description for dataset fields.
drop_fields
(names, *[, copy])Remove the given field names from the dataset.
extend
(*others[, repeat_allowed])Add the given dataset(s) to the end of the current dataset.
fields
([exclude_uid])Get a list of field names available in this dataset.
filter_fields
(names, *[, copy])Keep only the given fields from the dataset.
filter_prefix
(keep_prefix, *[, rename, copy])Similar to
filter_prefixes
but for a single prefix.filter_prefixes
(prefixes, *[, copy])Similar to
filter_fields
, except takes list of prefixes.from_async_stream
(stream)Asynchronously load from the given binary stream.
handle
()Numeric dataset handle for working with the dataset via C APIs (documentation is not yet available).
innerjoin
(*others[, assert_no_drop])Create a new dataset with fields from all provided datasets and only including rows common to all provided datasets (based on UID)
innerjoin_many
(*datasets)Similar to
Dataset.innerjoin
.inspect
(file)Given a path to a dataset file, get information included in its header.
interlace
(*datasets[, assert_same_fields])Combine the current dataset with one or more datasets of the same length by alternating rows from each dataset.
is_equivalent
(other)Check whether two datasets contain the same data, regardless of field order.
load
(file, *[, prefixes, fields, cstrs])Read a dataset from path or file handle.
mask
(mask)Get a subset of the dataset that matches the given boolean mask of rows.
prefixes
()List of field prefixes available in this dataset, assuming fields are have format
{prefix}/{field}
.query
(query)Get a subset of data based on whether the fields match the values in the given query.
query_mask
(query[, invert])Get a boolean array representing the items to keep in the dataset that match the given query filter.
Reset all values of the uid column to new unique random values.
rename_field
(current_name, new_name, *[, copy])Change name of a dataset field based on the given mapping.
rename_fields
(field_map, *[, copy])Change the name of dataset fields based on the given mapping.
rename_prefix
(old_prefix, new_prefix, *[, copy])Similar to rename_fields, except changes the prefix of all fields with the given
old_prefix
tonew_prefix
.replace
(query, *others[, assume_disjoint, ...])Replaces values matching the given query with others.
rows
()A row-by-row accessor list for items in this dataset.
save
(file[, format])Save a dataset to the given path or I/O buffer.
slice
([start, stop, step])Get subset of the dataset with rows in the given range.
split_by
(field)Create a mapping from possible values of the given field and to a datasets filtered by rows of that value.
stream
([compression])Generate a binary representation for this dataset.
subset
(rows)Get a subset of dataset that only includes the given list of rows (from this dataset).
take
(indices)Get a subset of data with only the matching list of row indices.
to_cstrs
(*[, copy])Convert all Python string columns to C strings.
to_list
([exclude_uid])Convert to a list of lists, each value of the outer list representing one dataset row.
to_pystrs
(*[, copy])Convert all C string columns to Python strings.
to_records
([fixed])Convert to a numpy record array.
union
(*others[, assert_same_fields, ...])Take the row union of all the given datasets, based on their uid fields.
union_many
(*datasets[, assert_same_fields, ...])Similar to
Dataset.union
.- add_fields(fields: Sequence[Tuple[str, str] | Tuple[str, str, Tuple[int, ...]]]) Dataset[R] #
- add_fields(fields: Sequence[str], dtypes: str | Sequence['DTypeLike']) Dataset[R]
Adds the given fields to the dataset. If a field with the same name already exists, that field will not be added (even if types don’t match). Fields are initialized with zeros (or “” for object fields).
- Parameters:
fields (list[str] | list[Field]) – Field names or description to add. If a list of names is specified, the second
dtypes
argument must also be specified.dtypes (str | list[DTypeLike], optional) – String with comma-separated data type names or list of data types. Must be specified if the
fields
argument is a list of strings, Defaults to None.
- Returns:
self with added fields
- Return type:
Examples
>>> dset = Dataset(3) >>> dset.add_fields( ... ['foo', 'bar'], ... ['u8', ('f4', (2,))] ... ) Dataset([ ('uid', [14727850622008419978 309606388100339041 15935837513913527085]), ('foo', [0 0 0]), ('bar', [[0. 0.] [0. 0.] [0. 0.]]), ]) >>> dset.add_fields([('baz', "O")]) Dataset([ ('uid', [14727850622008419978 309606388100339041 15935837513913527085]), ('foo', [0 0 0]), ('bar', [[0. 0.] [0. 0.] [0. 0.]]), ('baz', ["" "" ""]), ])
- classmethod allocate(size: int = 0, fields: Sequence[Tuple[str, str] | Tuple[str, str, Tuple[int, ...]]] = [])#
Allocate a dataset with the given number of rows and specified fields.
- Parameters:
size (int, optional) – Number of rows to allocate. Defaults to 0.
fields (list[Field], optional) – Initial fields, excluding
uid
. Defaults to [].
- Returns:
Empty dataset
- Return type:
- append(*others: Dataset, assert_same_fields=False, repeat_allowed=False)#
Concatenate many datasets together into one new one.
May be called either as an instance method or an initializer to create a new dataset from one or more datasets.
To initialize from zero or more datasets, use
Dataset.append_many
.- Parameters:
assert_same_fields (bool, optional) – If not set or False, appends only common dataset fields. If True, fails when input don’t have all fields in common. Defaults to False.
repeat_allowed (bool, optional) – If True, does not fail when there are duplicate UIDs. Defaults to False.
- Returns:
appended dataset
- Return type:
Examples
As an instance method
>>> dset = d1.append(d2, d3)
As a class method
>>> dset = Dataset.append(d1, d2, d3)
- classmethod append_many(*datasets: Dataset, assert_same_fields=False, repeat_allowed=False)#
Similar to
Dataset.append
. If no datasets are provided, returns an empty Dataset with just theuid
field.- Parameters:
assert_same_fields (bool, optional) – Same as for
append
method. Defaults to False.repeat_allowed (bool, optional) – Same as for
append
method. Defaults to False.
- Returns:
Appended dataset
- Return type:
- cols() Dict[str, Column] #
Get current dataset columns, organized by field.
- Returns:
Columns
- Return type:
dict[str, Column]
- classmethod common_fields(*datasets: Dataset, assert_same_fields=False) List[Tuple[str, str] | Tuple[str, str, Tuple[int, ...]]] #
Get a list of fields common to all given datasets.
- Parameters:
assert_same_fields (bool, optional) – If True, fails if datasets don’t all share the same fields. Defaults to False.
- Returns:
List of dataset fields and their data types.
- Return type:
list[Field]
- copy_fields(old_fields: List[str], new_fields: List[str])#
Copy the values at the given old fields into the new fields, allocating them if necessary.
- Parameters:
old_fields (List[str]) – Name of old fields to copy from
new_fields (List[str]) – New of new fields to copy to
- Returns:
current dataset with modified fields
- Return type:
- descr(exclude_uid=False) List[Tuple[str, str] | Tuple[str, str, Tuple[int, ...]]] #
Get numpy-compatible description for dataset fields.
- Parameters:
exclude_uid (bool, optional) – If True, uid field will not be included. Defaults to False.
- Returns:
Fields
- Return type:
list[Field]
- drop_fields(names: Collection[str] | Callable[[str], bool], *, copy: bool = False)#
Remove the given field names from the dataset. Provide a list of fields or a function that takes a field name and returns True if that field should be removed
- Parameters:
names (list[str] | (str) -> bool) – Collection of fields to remove or function that takes a field name and returns True if that field should be removed
copy (bool, optional) – If True, return a copy of dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with fields removed
- Return type:
- extend(*others: Dataset, repeat_allowed=False)#
Add the given dataset(s) to the end of the current dataset. Other datasets must have at least the same fields of the current dataset.
- Parameters:
repeat_allowed (bool, optional) – If True, does not fail when there are duplicate UIDs. Defaults to False.
- Returns:
current dataset with others appended
- Return type:
Examples
>>> len(d1), len(d2), len(d3) (42, 3, 5) >>> d1.extend(d2, d3) Dataset(...) >>> len(d1) 50
- fields(exclude_uid=False) List[str] #
Get a list of field names available in this dataset.
- Parameters:
exclude_uid (bool, optional) – If True, uid field will not be
False. (included. Defaults to)
- Returns:
List of field names
- Return type:
list[str]
- filter_fields(names: Collection[str] | Callable[[str], bool], *, copy: bool = False)#
Keep only the given fields from the dataset. Provide a list of fields or function that returns
True
if a given field name should be kept.- Parameters:
names (list[str] | (str) -> bool) – Collection of fields to keep or function that takes a field name and returns True if that field should be kept
copy (bool, optional) – It True, return a copy of the dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with filtered fields
- Return type:
- filter_prefix(keep_prefix: str, *, rename: str | None = None, copy: bool = False)#
Similar to
filter_prefixes
but for a single prefix.- Parameters:
keep_prefix (str) – Prefix to keep.
rename (str, optional) – If specified, rename prefix to this prefix. Defaults to None.
copy (bool, optional) – If True, return a copy if the dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with filtered prefix
- Return type:
- filter_prefixes(prefixes: Collection[str], *, copy: bool = False)#
Similar to
filter_fields
, except takes list of prefixes.- Parameters:
prefixes (list[str]) – Prefixes to keep
copy (bool, optional) – If True, return a copy if the dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with filtered prefixes
- Return type:
Examples
>>> dset = Dataset([ ... ('uid', [123 456 789]), ... ('field', [0 0 0]), ... ('foo/one', [1 2 3]), ... ('foo/two', [4 5 6]), ... ('bar/one', ['Hello' 'World' '!']), ... ]) >>> dset.filter_prefixes(['foo']) Dataset([ ('uid', [123 456 789]), ('foo/one', [1 2 3]), ('foo/two', [4 5 6]), ])
- async classmethod from_async_stream(stream: AsyncBinaryIO)#
Asynchronously load from the given binary stream. The given stream parameter must at least have
async read(n: int | None) -> bytes
method.
- handle() int #
Numeric dataset handle for working with the dataset via C APIs (documentation is not yet available).
- Returns:
- Dataset handle that may be used with C API defined in
<cryosparc-tools/dataset.h>
- Return type:
int
- innerjoin(*others: Dataset, assert_no_drop=False)#
Create a new dataset with fields from all provided datasets and only including rows common to all provided datasets (based on UID)
May be called either as an instance method or an initializer to create a new dataset from one or more datasets.
To initialize from zero or more datasets, use
Dataset.innerjoin_many
.- Parameters:
assert_no_drop (bool, optional) – Set to True to ensure the provided datasets include at least all UIDs from the first dataset. Defaults to False.
- Returns:
combined dataset.
- Return type:
Examples
As instance method
>>> dset = d1.innerjoin(d2, d3)
As class method
>>> dset = Dataset.innerjoin(d1, d2, d3)
- classmethod innerjoin_many(*datasets: Dataset)#
Similar to
Dataset.innerjoin
. If no datasets are provided, returns an empty Dataset with just theuid
field.- Returns:
combined dataset
- Return type:
- classmethod inspect(file: str | PurePath) DatasetHeader #
Given a path to a dataset file, get information included in its header.
- Parameters:
file – (str | Path): Readable file path.
- Returns:
Dictionary with dataset
- Return type:
- interlace(*datasets: Dataset, assert_same_fields=False)#
Combine the current dataset with one or more datasets of the same length by alternating rows from each dataset.
- Parameters:
assert_same_fields (bool, optional) – If True, fails if not all given datasets have the same fields. Otherwise result only includes common fields. Defaults to False.
- Returns:
combined dataset
- Return type:
- is_equivalent(other: object)#
Check whether two datasets contain the same data, regardless of field order.
- Parameters:
other (object) – dataset to compare
- Returns:
True or False
- Return type:
bool
- classmethod load(file: str | PurePath | IO[bytes], *, prefixes: Sequence[str] | None = None, fields: Sequence[str] | None = None, cstrs: bool = False)#
Read a dataset from path or file handle.
If given a file handle pointing to data in the usual numpy array format (i.e., created by
numpy.save()
), then the handle must be seekable. This restriction does not apply when loading the newerCSDAT_FORMAT
.- Parameters:
file (str | Path | IO) – Readable file path or handle. Must be seekable if loading a dataset saved in the default
NUMPY_FORMAT
prefixes (list[str], optional) – Which field prefixes to load. If not specified, loads either all or specified fields.
fields (list[str], optional) – Which fields to load. If not specified, loads either all or specified prefixes.
cstrs (bool) – If True, load internal string columns as C strings instead of Python strings. Defaults to False.
- Raises:
DatasetLoadError – If cannot load dataset file.
- Returns:
loaded dataset.
- Return type:
- mask(mask: List[bool] | NDArray)#
Get a subset of the dataset that matches the given boolean mask of rows.
- Parameters:
mask (list[bool] | NDArray[bool]) – mask to keep. Must match length of current dataset.
- Returns:
subset with only matching rows
- Return type:
- prefixes() List[str] #
List of field prefixes available in this dataset, assuming fields are have format
{prefix}/{field}
.- Returns:
List of prefixes
- Return type:
list[str]
Examples
>>> dset = Dataset({ ... 'uid': [123, 456, 789], ... 'field': [0, 0, 0], ... 'foo/one': [1, 2, 3], ... 'foo/two': [4, 5, 6], ... 'bar/one': ["Hello", "World", "!"] ... }) >>> dset.prefixes() ["field", "foo", "bar"]
- query(query: Dict[str, ArrayLike] | Callable[[R], bool])#
Get a subset of data based on whether the fields match the values in the given query. The query is either a test function that is called on each row or a key/value map of allowed field values.
Each value of a query dictionary may either be a single scalar value or a collection of matching values.
If any field is not in the dataset, it is ignored and all data is kept.
Note
Specifying a query function is very slow for large datasets.
- Parameters:
query (dict[str, ArrayLike] | (Row) -> bool) – Query description or row test function.
- Returns:
Subset matching the given query
- Return type:
Examples
With a query dictionary
>>> dset.query({ ... 'uid': [123456789, 987654321], ... 'micrograph_blob/path': '/path/to/exposure.mrc' ... }) Dataset(...)
With a function (not recommended)
>>> dset.query( ... lambda row: ... row['uid'] in [123456789, 987654321] and ... row['micrograph_blob/path'] == '/path/to/exposure.mrc' ... ) Dataset(...)
- query_mask(query: Dict[str, ArrayLike], invert=False) NDArray[n.bool_] #
Get a boolean array representing the items to keep in the dataset that match the given query filter. See
query
method for example query format.- Parameters:
query (dict[str, ArrayLike]) – Query description
invert (bool, optional) – If True, returns mask with all items negated. Defaults to False.
- Returns:
Query mask, may be used with the
mask()
method.- Return type:
NDArray[bool]
- reassign_uids()#
Reset all values of the uid column to new unique random values.
- Returns:
current dataset with modified UIDs
- Return type:
- rename_field(current_name: str, new_name: str, *, copy: bool = False)#
Change name of a dataset field based on the given mapping.
- Parameters:
current_name (str) – Old field name.
new_name (str) – New field name.
copy (bool, optional) – If True, return a copy of the dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with fields renamed
- Return type:
- rename_fields(field_map: Dict[str, str] | Callable[[str], str], *, copy: bool = False)#
Change the name of dataset fields based on the given mapping.
- Parameters:
field_map (dict[str, str] | (str) -> str) – Field mapping function or dictionary
copy (bool, optional) – If True, return a copy of the dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with fields renamed
- Return type:
- rename_prefix(old_prefix: str, new_prefix: str, *, copy: bool = False)#
Similar to rename_fields, except changes the prefix of all fields with the given
old_prefix
tonew_prefix
.- Parameters:
old_prefix (str) – old prefix to rename
new_prefix (str) – new prefix
copy (bool, optional) – If True, return a copy of the dataset rather than mutate. Defaults to False.
- Returns:
current dataset or copy with renamed prefix.
- Return type:
- replace(query: Dict[str, ArrayLike], *others: Dataset, assume_disjoint=False, assume_unique=False)#
Replaces values matching the given query with others. The query is a key/value map of allowed field values. The values may be either a single scalar value or a set of possible values. If nothing matches the query (e.g., {} specified), works the same way as append.
All given datasets must have the same fields.
- Parameters:
query (dict[str, ArrayLike]) – Query description.
assume_disjoint (bool, optional) – If True, assumes given datasets do not share any uid values. Defaults to False.
assume_unique (bool, optional) – If True, assumes each given dataset has no duplicate uid values. Defaults to False.
- Returns:
- subset with rows matching query removed and other datasets
appended at the end
- Return type:
- rows() Spool[R] #
A row-by-row accessor list for items in this dataset.
- Returns:
List-like row accessor
- Return type:
Examples
>>> dset = Dataset.load('/path/to/dataset.cs') >>> for row in dset.rows() ... print(row.to_dict())
- save(file: str | PurePath | IO[bytes], format: int = 1)#
Save a dataset to the given path or I/O buffer.
By default, saves as a numpy record array in the .npy format. Specify
format=CSDAT_FORMAT
to save in the latest .cs file format which is faster and results in a smaller file size but is not numpy-compatible.- Parameters:
file (str | Path | IO) – Writeable file path or handle
format (int, optional) – Must be of the constants
DEFAULT_FORMAT
,NUMPY_FORMAT
(same asDEFAULT_FORMAT
), orCSDAT_FORMAT
. Defaults toDEFAULT_FORMAT
.
- Raises:
TypeError – If invalid format specified
- slice(start: int = 0, stop: int | None = None, step: int = 1)#
Get subset of the dataset with rows in the given range.
- Parameters:
start (int, optional) – Start index to slice from (inclusive). Defaults to 0.
stop (int, optional) – End index to slice until (exclusive). Defaults to length of dataset.
step (int, optional) – How many entries to step over in resulting slice. Defaults to 1.
- Returns:
subset with slice of matching rows
- Return type:
- split_by(field: str)#
Create a mapping from possible values of the given field and to a datasets filtered by rows of that value.
Examples
>>> dset = Dataset([ ... ('uid', [1, 2, 3, 4]), ... ('foo', ['hello', 'world', 'hello', 'world']) ... ]) >>> dset.split_by('foo') { 'hello': Dataset([('uid', [1, 3]), ('foo', ['hello', 'hello'])]), 'world': Dataset([('uid', [2, 4]), ('foo', ['world', 'world'])]) }
- stream(compression: Literal['lz4', None] | None = None) Generator[bytes, None, None] #
Generate a binary representation for this dataset. Results may be written to a file or buffer to be sent over the network.
Buffer will have the same format as Dataset files saved with
format=CSDAT_FORMAT
. CallDataset.load
on the resulting file/buffer to retrieve the original data.- Parameters:
compression (Literal["lz4", None], optional)
- Yields:
bytes – Dataset file chunks
- subset(rows: Collection[Row])#
Get a subset of dataset that only includes the given list of rows (from this dataset).
- take(indices: List[int] | NDArray)#
Get a subset of data with only the matching list of row indices.
- Parameters:
indices (list[int] | NDArray[int]) – collection of indices to keep.
- Returns:
subset with matching row indices
- Return type:
- to_cstrs(*, copy: bool = False)#
Convert all Python string columns to C strings. Resulting dataset fields that previously had dtype
np.object_
(orT_OBJ
internally) will get typenp.uint64
and may be accessed as via the dataset C API.Note: This operation takes a long time for large datasets.
- Parameters:
copy (bool, optional) – If True, returns a modified copy of the dataset instead of mutation. Defaults to False.
- Returns:
same dataset or copy if specified.
- Return type:
- to_list(exclude_uid=False) List[list] #
Convert to a list of lists, each value of the outer list representing one dataset row. Every value in the resulting list is guaranteed to be a python type (no numpy numeric types).
- Parameters:
exclude_uid (bool, optional) – If True, uid column will not be included in output list. Defaults to False.
- Returns:
list of row lists
- Return type:
list
Examples
>>> dset = Dataset([ ... ('uid', [123 456 789]), ... ('foo/one', [1 2 3]), ... ('foo/two', [4 5 6]), ... ]) >>> dset.to_list() [[123, 1, 4], [456, 2, 5], [789, 3, 6]]
- to_pystrs(*, copy: bool = False)#
Convert all C string columns to Python strings. Resulting dataset fields that previously had dtype
np.uint64
(andT_STR
internally) will get typenp.object_
.- Parameters:
copy (bool, optional) – If True, returns a modified copy of the dataset instead of mutation. Defaults to False.
- Returns:
same dataset or copy if specified.
- Return type:
- to_records(fixed=False)#
Convert to a numpy record array.
- Parameters:
fixed (bool, optional) – If True, converts string columns (
dtype("O")
) to fixed-length strings (dtype("S")
). Defaults to False.- Returns:
Numpy record array
- Return type:
NDArray
- union(*others: Dataset, assert_same_fields=False, assume_unique=False)#
Take the row union of all the given datasets, based on their uid fields.
May be called either as an instance method or an initializer to create a new dataset from one or more datasets:
To initialize from zero or more datasets, use
Dataset.union_many
.- Parameters:
assert_same_fields (bool, optional) – Set to True to enforce that datasets have identical fields. Otherwise, result only includes fields common to all datasets. Defaults to False.
assume_unique (bool, optional) – Set to True to assume that each input dataset’s UIDs are unique (though there may be common UIDs between datasets). Defaults to False.
- Returns:
Combined dataset
- Return type:
Examples
As instance method
>>> dset = d1.union(d2, d3)
As class method
>>> dset = Dataset.union(d1, d2, d3)
- classmethod union_many(*datasets: Dataset, assert_same_fields=False, assume_unique=False)#
Similar to
Dataset.union
. If no datasets are provided, returns an empty Dataset with just theuid
field.- Parameters:
assert_same_fields (bool, optional) – Same as for
union
. Defaults to False.assume_unique (bool, optional) – Same as for
union
. Defaults to False.
- Returns:
combined dataset, or empty dataset if none are provided.
- Return type:
- cryosparc.dataset.NEWEST_FORMAT = 2#
Newest save .cs file format. Same as
CSDAT_FORMAT
.
- cryosparc.dataset.NUMPY_FORMAT = 1#
Numpy-array .cs file format. Same as
DEFAULT_FORMAT
.
- cryosparc.dataset.generate_uids(num: int = 0)#
Generate the given number of random 64-bit unsigned integer uids.
- Parameters:
num (int, optional) – Number of UIDs to generate. Defaults to 0.
- Returns:
Numpy array of random unsigned 64-bit integers
- Return type:
NDArray