Tutorial: EMPIAR-10499 70S ribosomes#

Here we present tutorials for processing heterogeneous ribosome data from cryo-ET benchmark dataset EMPIAR-10499, as described in our tomoDRGN manuscript. The tutorials cover the following stages of processing:

  1. upstream processing and obtaining input data for tomoDRGN

  2. validating that particles and metadata were extracted correctly: tomodrgn backproject_voxel, or tomodrgn train_nn with tomodrgn convergence_nn

  3. learning structural heterogeneity within the dataset: tomodrgn train_vae with tomodrgn convergence_vae

  4. analyzing structural heterogeneity within the dataset: tomodrgn analyze, tomodrgn eval_vol and tomodrgn analyze_volumes, and external tools including SIREn and MAVEn

  5. visualizing structural heterogeneity patterns in the tomogram’s spatial context: tomoDRGN’s interactive visualization jupyter notebook and tomodrgn subtomo2chimerax

  6. isolating particle subsets of interest: tomoDRGN’s interactive filtering jupyter notebook and tomodrgn filter_star

  7. taking homogeneous particle subsets back into external STA tools for further refinement

With these steps as building blocks, many additional types of analyses are possible.

Note

These tutorials were originally written using data processed using the Warp v1 -> RELION v3 -> M STA pipeline. Therefore, the command syntax used in the tutorials is specific to “Warp v1 style inputs”. Alternative syntax for other STA pipelines, including “WarpTools style inputs”, are given in the command usage section.