Upstream processing#
tomoDRGN inputs and nomenclature#
Several input data types from upstream processing are required to take full advantage of tomoDRGN’s heterogeneity analysis, validation, and iterative processing potential. However, only some files are required for minimal functionality. See additional details about compatible upstream STA software here.
(Required) projection images of each particle: “image series” subtomograms
Real-space 2-D projection images, typically extracted from motion-corrected micrographs in
.mrcs
format.Each particle should be sampled by multiple images, collected at a range of different stage tilts.
(Required) STA-preprocessed metadata for each image of each particle: “image series” star file
Single metadata file, typically in
.star
format, specifyinghow each image of each particle is posed (translation and rotation) relative to the consensus reconstruction,
the parameters of the CTF affecting each particle image,
the path to particle images on disk,
which images are associated with a given particle
(Recommended) STA-preprocessed metadata for each particle: “volume series” star file
Single metadata file, typically in
.star
format, specifyingthe location and orientation of each particle in the tomogram coordinate system
Note
When tomoDRGN was initially developed, the “volume series” star file metadata was only available in a separate file obtained by separate subtomogram extraction (Warp v1 “volume series” subtomograms) compared to the “image series metadata (generated by Warp v1 “image series” or “particle series” subtomogram extraction). However, several more recent software packages adopt a new “optimisation set” format which makes both sets of metadata available in a single star file (as in WarpTools and recent versions of RELION). Therefore, for historical reasons, tomoDRGN script options often specify whether an “image series” or a “volume series” star file is required for a given script, whereas an “optimisation set” star file can generally be used in either role (see specific examples in the command usage section for more details).
Obtaining particles#
Raw tilt movies for EMPIAR-10499 can be downloaded from EMPIAR following the latest instructions here. The raw dataset requires about 85 GB storage capacity.
Cryo-ET preprocessing and subsequent STA of 70S ribosomes should be performed using one of tomoDRGN’s supported upstream software stacks.
We have previously processed 70S ribosomes from EMPIAR-10499 using the Warp v1 -> RELION v3 -> M pipeline, as detailed in our tomoDRGN manuscript. These preprocessed particles are available from EMPIAR via accession ID EMPIAR-11843 and can be downloaded from EMPIAR following the latest instructions here.
Six datasets (plus a directory for associated star files) are deposited at that accession ID, containing the same set of particles re-extracted at different box and pixel sizes. For the sake of simplicity, we will only be working with the following datasets, which require about 60 GB storage capacity:
22,291 ribosomes extracted as “image series” with box size 96 px and pixel size 3.71 Å/px
22,291 ribosomes extracted as “volume series” with box size 64 px and pixel size 6 Å/px
star files containing metadata for each set of extracted particles