tomodrgn analyze#

Purpose#

Run standard analyses of a train_vae model: dimensionality reduction and clustering of latent space, and generation of volumes from latent clustering and latent PCA via the tomoDRGN decoder for further analysis.

Sample usage#

The examples below are adapted from tomodrgn/testing/commandtest*.py, and rely on other outputs from commandtest.py to execute successfully.

# Warp v1 style inputs
tomodrgn \
    analyze \
    output/vae_both_sim_zdim8_dosetiltweightmask_batchsize8 \
    --ksample 20

# WarpTools style inputs
tomodrgn \
    analyze \
    output/vae_warptools_70S_zdim8_dosetiltweightmask_batchsize8 \
    --ksample 20

Arguments#

usage: analyze [-h] [--epoch EPOCH] [--device DEVICE] [-o OUTDIR] [--skip-vol]
               [--skip-umap] [--plot-format {png,svgz}] [--pc PC]
               [--pc-ondata] [--ksample KSAMPLE] [--downsample DOWNSAMPLE]
               [--lowpass LOWPASS] [--flip] [--invert]
               workdir

Positional Arguments#

workdir

Directory with tomoDRGN results

Named Arguments#

--epoch

Epoch number N to analyze (0-based indexing, corresponding to z.N.pkl, weights.N.pkl). Supplying latest will auto-detect the latest completed epoch of training.

Default: 'latest'

Core arguments#

--device

Optionally specify CUDA device

-o, --outdir

Output directory for analysis results (default: [workdir]/analyze.[epoch])

--skip-vol

Skip generation of volumes

Default: False

--skip-umap

Skip running UMAP

Default: False

--plot-format

Possible choices: png, svgz

File format with which to save plots

Default: 'png'

Arguments for latent space analysis#

--pc

Number of principal component traversals to generate (default: 2)

Default: 2

--pc-ondata

Find closest on-data latent point to each PC percentile

Default: False

--ksample

Number of kmeans samples to generate (default: 20)

Default: 20

Arguments for volume generation#

--downsample

Downsample volumes to this box size (pixels)

--lowpass

Lowpass filter to this resolution in Å

--flip

Flip handedness of output volumes

Default: False

--invert

Invert contrast of output volumes

Default: False

Common next steps#

  • Interactively explore correlations between and spatial context of star file parameters, latent embeddings, volume space dimensionality reduction in the generated Jupyter notebooks

  • Identify one (or more) sets of particle indices whose particles share a common feature (e.g. in latent space)

  • Filter the input star file by particle indices with tomodrgn filter_star

  • Generate an array of numeric labels describing a latent space property for each particle to color volumes in tomogram mapbacks with tomodrgn subtomo2chimerax