Upstream processing ==================== tomoDRGN inputs and nomenclature --------------------------------- Several input data types from upstream processing are required to take full advantage of tomoDRGN's heterogeneity analysis, validation, and iterative processing potential. However, only some files are required for minimal functionality. See additional details about compatible upstream STA software :ref:`here `. #. (Required) projection images of each particle: "image series" subtomograms * Real-space 2-D projection images, typically extracted from motion-corrected micrographs in ``.mrcs`` format. * Each particle should be sampled by multiple images, collected at a range of different stage tilts. #. (Required) STA-preprocessed metadata for each image of each particle: "image series" star file * Single metadata file, typically in ``.star`` format, specifying - how each image of each particle is posed (translation and rotation) relative to the consensus reconstruction, - the parameters of the CTF affecting each particle image, - the path to particle images on disk, - which images are associated with a given particle #. (Recommended) STA-preprocessed metadata for each particle: "volume series" star file * Single metadata file, typically in ``.star`` format, specifying - the location and orientation of each particle in the tomogram coordinate system .. note:: When tomoDRGN was initially developed, the "volume series" star file metadata was only available in a separate file obtained by separate subtomogram extraction (Warp v1 "volume series" subtomograms) compared to the "image series metadata (generated by Warp v1 "image series" or "particle series" subtomogram extraction). However, several more recent software packages adopt a new "optimisation set" format which makes both sets of metadata available in a single star file (as in WarpTools and recent versions of RELION). Therefore, for historical reasons, tomoDRGN script options often specify whether an "image series" or a "volume series" star file is required for a given script, whereas an "optimisation set" star file can generally be used in either role (see specific examples in the :doc:`command usage <../command_usage/index>` section for more details). Obtaining particles -------------------- .. tab-set:: .. tab-item:: Processing from raw data Raw tilt movies for EMPIAR-10499 can be downloaded from EMPIAR following the latest instructions `here `_. The raw dataset requires about 85 GB storage capacity. Cryo-ET preprocessing and subsequent STA of 70S ribosomes should be performed using one of tomoDRGN's :ref:`supported upstream software stacks `. .. tab-item:: Download preprocessed particles We have previously processed 70S ribosomes from EMPIAR-10499 using the Warp v1 -> RELION v3 -> M pipeline, as detailed in our tomoDRGN manuscript. These preprocessed particles are available from EMPIAR via accession ID EMPIAR-11843 and can be downloaded from EMPIAR following the latest instructions `here `_. Six datasets (plus a directory for associated star files) are deposited at that accession ID, containing the same set of particles re-extracted at different box and pixel sizes. For the sake of simplicity, we will only be working with the following datasets, which require about 60 GB storage capacity: * 22,291 ribosomes extracted as "image series" with box size 96 px and pixel size 3.71 Å/px * 22,291 ribosomes extracted as "volume series" with box size 64 px and pixel size 6 Å/px * star files containing metadata for each set of extracted particles