tomodrgn analyze_volumes#

Purpose#

Run standard volume-space analyses of a train_vae model: dimensionality reduction and clustering of a volume ensemble.

Sample usage#

The examples below are adapted from tomodrgn/testing/commandtest*.py, and rely on other outputs from commandtest.py to execute successfully.

# Warp v1 style inputs
tomodrgn \
    analyze_volumes \
    --voldir output/vae_both_sim_zdim2/eval_vol_allz \
    --config output/vae_both_sim_zdim2/config.pkl \
    --outdir output/vae_both_sim_zdim2/eval_vol_allz_analyze_volumes_mask_soft \
    --ksample 20 \
    --mask soft

# WarpTools style inputs
tomodrgn \
    analyze_volumes \
    --voldir output/vae_warptools_70S_zdim2/eval_vol_allz \
    --config output/vae_warptools_70S_zdim2/config.pkl \
    --outdir output/vae_warptools_70S_zdim2/eval_vol_allz_analyze_volumes_mask_soft \
    --ksample 20 \
    --mask soft

Arguments#

usage: analyze_volumes [-h] --voldir VOLDIR --config CONFIG [--outdir OUTDIR]
                       [--num-pcs NUM_PCS] [--ksample KSAMPLE]
                       [--plot-format {png,svgz}] [--mask-path MASK_PATH]
                       [--mask {none,sphere,tight,soft}] [--thresh THRESH]
                       [--dilate DILATE] [--dist DIST]

Core arguments#

--voldir

path to directory containing volumes to analyze

--config

path to train_vae config file

--outdir

path to directory to save outputs. Default is same directory and basename as voldir, appended with analyze_volumes

--num-pcs

keep this many PCs when saving PCA and running UMAP

Default: 128

--ksample

Number of kmeans samples to generate (clustering voxel-PCA space). Note that this is only recommended if all particles in the dataset have had volumes generated in –voldir, to avoid confusion of k-means origin in latent space clustering and/or volume space clustering.

--plot-format

Possible choices: png, svgz

File format with which to save plots

Default: 'png'

Mask generation arguments#

--mask-path

Supply a custom real space mask instead of having tomoDRGN calculate a mask.

--mask

Possible choices: none, sphere, tight, soft

Type of real space mask to generate for each volume when calculating voxel-PCA.Note that tight and soft masks are calculated uniquely per-volume.

--thresh

Isosurface percentile at which to threshold volume; default is to use 99th percentile. Only relevant for tight and soft masks.

--dilate

Number of voxels to dilate thresholded isosurface outwards from mask boundary; default is to use 1/30th of box size (px). Only relevant for soft mask.

--dist

Number of voxels over which to apply a soft cosine falling edge from dilated mask boundary; default is to use 1/30th of box size (px). Only relevant for soft mask.

Common next steps#

  • Interactively explore correlations between and spatial context of star file parameters, latent embeddings, volume space dimensionality reduction in the tomodrgn analyze Jupyter notebooks

  • Identify one (or more) sets of particle indices whose particles share a common feature (e.g. in volume space)

  • Filter the input star file by particle indices with tomodrgn filter_star

  • Generate an array of numeric labels describing a volume space property for each particle to color volumes in tomogram mapbacks with tomodrgn subtomo2chimerax