2. Pick particles with crYOLO#

This example covers crYOLO Tutorial 2, using particle picks from cryoSPARC as training data. The crYOLO GUI is not required for this tutorial.

All code runs from conda environment installed with the following commands:

conda create -n cryolo -c conda-forge \
   python=3 numpy==1.18.5 \
   libtiff pyqt=5 wxPython=4.1.1 adwaita-icon-theme
conda activate cryolo
pip install -U pip
pip install nvidia-pyindex
pip install cryolo[c11] cryosparc-tools

2.1. Prelude#

Connect to a cryoSPARC instance with cryosparc.tools.CryoSPARC and get the project handle. This project contains a workspace W3 with the following jobs:

  • Patch CTF with 20 motion-corrected and CTF-estimated micrographs

  • Exposures Sets tool that splits micrographs into 5 for training/validation and 15 for picking from the trained model

  • Manual Picker that picks 5 training micrographs to completion

Note

Saving crYOLO outputs to the CryoSPARC project directory requires file-system access to the project directory.

from cryosparc.tools import CryoSPARC

cs = CryoSPARC(host="cryoem5", base_port=40000)
assert cs.test_connection()

project = cs.find_project("P251")
Connection succeeded to CryoSPARC command_core at http://cryoem5:40002
Connection succeeded to CryoSPARC command_vis at http://cryoem5:40003

Programatically create and build a new external job. This job will contain the results of both training and picking for the remaining micrographs.

Connect the training picks, training micrographs and remaining micrographs as inputs. Specify the micrograph_blob as slots for the micrographs to retrieve the motion-corrected blob location. Other micrograph slots will be connected as passthroughs. Specify the location slot for particles to load the micrograph path and \((x, y)\) coordinates for each particle.

Create an output slots for the resulting picks. It will only have a location slot.

job = project.create_external_job("W3", title="crYOLO Picks")
job.connect("train_micrographs", "J18", "split_0", slots=["micrograph_blob"])
job.connect("train_particles", "J19", "particles_selected", slots=["location"])
job.connect("all_micrographs", "J18", "split_0", slots=["micrograph_blob"])
job.connect("all_micrographs", "J18", "remainder", slots=["micrograph_blob"])
job.add_output("particle", "predicted_particles", slots=["location", "pick_stats"])
'predicted_particles'

Start the job to write to the outputs and job log. This puts the job in “Waiting” status.

job.start()

2.2. Data Preparation#

Use the job directory as crYOLO’s working directory. Create all the necessary subfolders there.

job.mkdir("full_data")
job.mkdir("train_image")
job.mkdir("train_annot")

Load the input micrographs and link them into the full_data and train_image directories. This results in the following directory structure:

/path/to/project/JX/
├── full_data
│   ├── mic01.mrc -> /path/to/project/JY/motioncorrected/mic01.mrc
│   ├── mic02.mrc -> /path/to/project/JY/motioncorrected/mic02.mrc
│   ├── ...
│   └── mic20.mrc -> /path/to/project/JY/motioncorrected/mic20.mrc
└── train_image
    ├── mic01.mrc -> /path/to/project/JY/motioncorrected/mic01.mrc
    ├── mic02.mrc -> /path/to/project/JY/motioncorrected/mic02.mrc
    ├── mic03.mrc -> /path/to/project/JY/motioncorrected/mic03.mrc
    ├── mic04.mrc -> /path/to/project/JY/motioncorrected/mic04.mrc
    └── mic05.mrc -> /path/to/project/JY/motioncorrected/mic05.mrc
all_micrographs = job.load_input("all_micrographs", ["micrograph_blob"])
train_micrographs = job.load_input("train_micrographs", ["micrograph_blob"])

for mic in all_micrographs.rows():
    source = mic["micrograph_blob/path"]
    target = job.uid + "/full_data/" + source.split("/")[-1]
    project.symlink(source, target)

for mic in train_micrographs.rows():
    source = mic["micrograph_blob/path"]
    target = job.uid + "/train_image/" + source.split("/")[-1]
    project.symlink(source, target)

crYOLO requires the particle locations for each micrograph in STAR format with the following directory stucture:

/path/to/project/JX/
└── train_annot
    └── STAR
        ├── mic01.star
        ├── mic02.star
        ├── mic03.star
        ├── mic04.star
        └── mic05.star

Load the training particle locations. Split them up my micrograph path. Compute the pixel locations and save them to a star file in this format.

from io import StringIO
import numpy as np
from numpy.core import records
from cryosparc import star

job.mkdir("train_annot/STAR")
train_particles = job.load_input("train_particles", ["location"])

for micrograph_path, particles in train_particles.split_by("location/micrograph_path").items():
    micrograph_name = micrograph_path.split("/")[-1]
    star_file_name = micrograph_name.rsplit(".", 1)[0] + ".star"

    mic_w = particles["location/micrograph_shape"][:, 1]
    mic_h = particles["location/micrograph_shape"][:, 0]
    center_x = particles["location/center_x_frac"]
    center_y = particles["location/center_y_frac"]
    location_x = center_x * mic_w
    location_y = center_y * mic_h

    outfile = StringIO()
    star.write(
        outfile,
        records.fromarrays([location_x, location_y], names=["rlnCoordinateX", "rlnCoordinateY"]),
    )
    outfile.seek(0)
    job.upload("train_annot/STAR/" + star_file_name, outfile)

Preview of the particle locations used on the last training micrograph:

%matplotlib inline

from cryosparc import mrc
from cryosparc.tools import downsample, lowpass2
import matplotlib.pyplot as plt

header, mic = project.download_mrc(micrograph_path)
binned = downsample(mic, factor=3)
lowpassed = lowpass2(binned, psize_A=0.6575, cutoff_resolution_A=20, order=0.7)
height, width = lowpassed.shape
vmin = np.percentile(lowpassed, 1)
vmax = np.percentile(lowpassed, 99)

fig, ax = plt.subplots(figsize=(7.5, 8), dpi=144)
ax.axis("off")
ax.imshow(lowpassed, cmap="gray", vmin=vmin, vmax=vmax, origin="lower")
ax.scatter(center_x * width, center_y * height, c="yellow", marker="+")

fig.tight_layout()
../_images/c214c8e5f630199ec722e7babfb36c93470ad84c53d9850db99126dac0b0bd01.png

2.3. Configuration#

Generate crYOLO’s configuration file from the command line. On this machine, crYOLO is installed in a conda environment in the home directory. Run the cryolo_gpu.py config command from the job directory to generate a configuration file in the project directory. Use a box size of 130 for this dataset.

Use job.subprocess to forward the output to the job stream log. Arguments are similar to Python’s subprocess.Popen().

job.subprocess(
    (
        "cryolo_gui.py config config_cryolo.json 130 "
        "--train_image_folder train_image "
        "--train_annot_folder train_annot"
    ).split(" "),
    cwd=job.dir(),
)
#####################################################
Important debugging information.
In case of any problems, please provide this information.
#####################################################
/u/nfrasser/miniconda3/envs/tools/bin/cryolo_gui.py config config_cryolo.json 130
--train_image_folder train_image
--train_annot_folder train_annot
#####################################################

 Wrote config to config_cryolo.json

2.4. Training#

Run training on GPU 0 with 5 warmup-epochs and an early stop of 15.

The output of this command is quite long, so set mute=True to hide it (it will still appear in the job’s stream log).

Use the checkpoint_line_pattern flag new training epoch lines as the beginning of a checkpoint in the stream log.

job.subprocess(
    "cryolo_train.py -c config_cryolo.json -w 5 -g 0 -e 15".split(" "),
    cwd=job.dir(),
    mute=True,
    checkpoint=True,
    checkpoint_line_pattern=r"Epoch \d+/\d+",  # e.g., "Epoch 42/200"
)

This creates a cryolo_model.h5 trained model file in the job directory.

2.5. Picking#

Use the trained model to predict particle locations for the full dataset. Create a boxfiles directory to store the output.

job.mkdir("boxfiles")
job.subprocess(
    "cryolo_predict.py -c config_cryolo.json -w cryolo_model.h5 -i full_data -g 0 -o boxfiles -t 0.3".split(" "),
    cwd=job.dir(),
    mute=True,
    checkpoint=True,
)

For each micrograph in the full dataset, load the corresponding output particles star file, initialize a new empty particles dataset and fill in the predicted locations and other relevant location metadata.

Also fill in a dummy NCC score so that the results may be inspected with an Inspect Picks job.

output_star_folder = "STAR"

all_predicted = []
for mic in all_micrographs.rows():
    micrograph_path = mic["micrograph_blob/path"]
    micrograph_name = micrograph_path.split("/")[-1]
    height, width = mic["micrograph_blob/shape"]

    starfile_name = micrograph_name.rsplit(".", 1)[0] + ".star"
    starfile_path = "boxfiles/STAR/" + starfile_name
    locations = star.read(job.dir() / starfile_path)[""]
    center_x = locations["rlnCoordinateX"] / width
    center_y = locations["rlnCoordinateY"] / height

    predicted = job.alloc_output("predicted_particles", len(locations))
    predicted["location/micrograph_uid"] = mic["uid"]
    predicted["location/micrograph_path"] = mic["micrograph_blob/path"]
    predicted["location/micrograph_shape"] = mic["micrograph_blob/shape"]
    predicted["location/micrograph_psize_A"] = mic["micrograph_blob/psize_A"]
    predicted["location/center_x_frac"] = center_x
    predicted["location/center_y_frac"] = center_y
    predicted["pick_stats/ncc_score"] = 0.5

    all_predicted.append(predicted)

Output the most recent predicted particle locations to verify that crYOLO ran successfully.

header, mic = project.download_mrc(micrograph_path)
binned = downsample(mic, factor=3)
lowpassed = lowpass2(binned, psize_A=0.6575, cutoff_resolution_A=20, order=0.7)
height, width = lowpassed.shape
vmin = np.percentile(lowpassed, 1)
vmax = np.percentile(lowpassed, 99)

fig, ax = plt.subplots(figsize=(7.5, 8), dpi=144)
ax.axis("off")
ax.imshow(lowpassed, cmap="gray", vmin=vmin, vmax=vmax, origin="lower")
ax.scatter(center_x * width, center_y * height, c="cyan", marker="+")

fig.tight_layout()
../_images/bbbe4e0e836535bbb86d1f43fdda13b238e3fbc5981a8c1ba7c3b7a7f5e4250b.png

Append all the predicted particles into a single dataset. Save this to the job and mark as completed.

from cryosparc.dataset import Dataset

job.save_output("predicted_particles", Dataset.append(*all_predicted))
job.stop()