Validate particle extraction#

Purpose#

Once you have extracted (or downloaded) a subtomogram particleseries, it’s a good idea to validate that the extraction worked correctly. A quick way to confirm successful particle extraction is to generate a homogeneous 3-D reconstruction from the extracted 2-D particles, either via tomodrgn backproject_voxel or tomodrgn train_nn. Both approaches to homogeneous reconstructions benefit from access to a machine with a GPU available for computation, and sufficient RAM to hold all particles in memory. These hardware resources are particularly important for tomodrgn train_nn.

For most users, we recommend running tomodrgn backproject_voxel as it is faster than tomodrgn train_nn. However, we include examples of each for reference.

Performing homogeneous reconstruction#

In this example, we backproject all particles referenced in imageseries.star. The full list of command line arguments can be found here.

mkdir 01_backproject
tomodrgn backproject_voxel \
    imageseries.star \
    --output 01_validate_backproject/backproject_weighted.mrc \
    --datadir .../path/to/particleseries/images \
    --recon-dose-weight \
    --recon-tilt-weight

This command produces several outputs in the 01_validate_backproject directory:

  • backproject_weighted.mrc: this is the unfiltered backprojected reconstruction

  • backproject_weighted_half*.mrc: these two maps are the unfiltered backprojected half-map reconstructions from randomly selected halves of the dataset

  • backproject_weighted_fsc.png: this is the FSC between the two half-maps, calculated with an automatically generated soft mask

  • backproject_weighted_filt.mrc: this is the backprojected reconstruction, lowpass filtered to the resolution at which half-maps FSC drops below 0.143

In this example, we train a homogeneous (decoder-only) tomoDRGN network to reconstruct the particles referenced in particleseries.star. The full list of command line arguments can be found here.

tomodrgn train_nn \
    imageseries.star \
    --outdir 02_validate_train-nn \
    --datadir .../path/to/particleseries/images \
    --recon-dose-weight \
    --recon-tilt-weight \
    --l-dose-mask \
    --num-epochs 20

This command produces several outputs in the 02_validate_train-nn directory:

  • config.pkl

  • run.log

  • weights.*.pkl

  • reconstruct.*.mrc

Interpreting outputs#

Open your backproject_weighted_filt.mrc or reconstruct.19.mrc, as appropriate, in ChimeraX or a similar 3D volume viewer. If all went well, you should see a reconstruction that looks like your desired structure. Note that in tomodrgn backproject_voxel the CTF is modeled via phase flipping only, whereas in tomodrgn train_nn the CTF is modeled via both phase and amplitude correction (due to different reconstruction approaches).

Assessing model convergence and overfitting#

In the case of using tomodrgn train_nn, the tool tomodrgn convergence_nn can be used to monitor the FSC between the trained model’s consensus reconstruction and an external consensus reconstruction at every epoch at which a checkpoint was evaluated during model training.

tomodrgn convergence_nn \
    02_validate_train-nn \
    path/to/reference_volume.mrc \
    --fsc-mask soft

The outputs of this command include the following:

  • plots

    • FSC between reconstruct.*.mrc and reference_volume.mrc at every epoch, using the specified --fsc-mask

    • FSC between reconstruct.*.mrc and reference_volume.mrc at the final training epoch, using the specified --fsc-mask

    • resolution at FSC correlation of 0.5 at every epoch

    • resolution at FSC correlation of 0.143 at every epoch

  • freqs_fscs.pkl: the spatial resolution and FSC information stored as a tuple of numpy arrays in a .pkl file

Model convergence is generally observed as a stabilization of the FSC curve over successive epochs of training. Model overfitting is generally observed as worse FSC curves over successive epochs of training.

Common pitfalls#

If your reconstruction does not look like an interpretable structure similar to that produced by upstream processing, here are a few things to check:

  • volume looks hollow: try adding --uninvert-data to your backproject_voxel or train_nn command to fix the data sign convention for your particles (light-on-dark vs dark-on-light)

  • volume looks like a featureless ball of the appropriate diameter for your particle: the rotations specified in your star file may be inaccurate. Check these poses by reconstructing these particles with mpirun -n NUM_MPI_PROCESSES relion_reconstruct_mpi --i particleseries.star --o reconstruct_relion.mrc --ctf.

  • volume looks like a cube of noise:

    1. try setting a stronger lowpass filter (--lowpass) or using more input particles (--use-first-nptcls) if using tomodrgn backproject_voxel, or training for fewer epochs (--num-epochs) with more particles (-use-first-nptcls) if using tomodrgn train_nn

    2. confirm that your particle coordinates were supplied for extraction with the correct pixel size and align with your desired particles (e.g. with Cube, Napari, or similar). Check these poses by reconstructing these particles with mpirun -n NUM_MPI_PROCESSES relion_reconstruct_mpi --i particleseries.star --o reconstruct_relion.mrc --ctf.