tomodrgn analyze_volumes#

Purpose#

Run standard volume-space analyses of a train_vae model: dimensionality reduction and clustering of a volume ensemble.

Sample usage#

The examples below are adapted from tomodrgn/testing/commandtest*.py, and rely on other outputs from commandtest.py to execute successfully.

# Warp v1 style inputs
tomodrgn \
    analyze_volumes \
    --voldir output/vae_both_sim_zdim2/eval_vol_allz \
    --config output/vae_both_sim_zdim2/config.pkl \
    --outdir output/vae_both_sim_zdim2/eval_vol_allz_analyze_volumes_mask_soft \
    --ksample 20 \
    --mask soft

# WarpTools style inputs
tomodrgn \
    analyze_volumes \
    --voldir output/vae_warptools_70S_zdim2/eval_vol_allz \
    --config output/vae_warptools_70S_zdim2/config.pkl \
    --outdir output/vae_warptools_70S_zdim2/eval_vol_allz_analyze_volumes_mask_soft \
    --ksample 20 \
    --mask soft

Arguments#

usage: analyze_volumes [-h] --voldir VOLDIR --config CONFIG [--outdir OUTDIR]
                       [--num-pcs NUM_PCS] [--ksample KSAMPLE]
                       [--plot-format {png,svgz}] [--mask-path MASK_PATH]
                       [--mask {none,sphere,tight,soft}] [--thresh THRESH]
                       [--dilate DILATE] [--dist DIST]

Core arguments#

--voldir

path to directory containing volumes to analyze

--config

path to train_vae config file

--outdir

path to directory to save outputs. Default is same directory and basename as voldir, appended with analyze_volumes

--num-pcs

keep this many PCs when saving PCA and running UMAP

Default: 128

--ksample

Number of kmeans samples to generate (clustering voxel-PCA space). Note that this is only recommended if all particles in the dataset have had volumes generated in –voldir, to avoid confusion of k-means origin in latent space clustering and/or volume space clustering.

--plot-format

Possible choices: png, svgz

File format with which to save plots

Default: 'png'

Mask generation arguments#

--mask-path

Supply a custom real space mask instead of having tomoDRGN calculate a mask.

--mask

Possible choices: none, sphere, tight, soft

Type of real space mask to generate for each volume when calculating voxel-PCA.Note that tight and soft masks are calculated uniquely per-volume.

--thresh

Isosurface percentile at which to threshold volume; default is to use 99th percentile. Only relevant for tight and soft masks.

--dilate

Number of voxels to dilate thresholded isosurface outwards from mask boundary; default is to use 1/30th of box size (px). Only relevant for soft mask.

--dist

Number of voxels over which to apply a soft cosine falling edge from dilated mask boundary; default is to use 1/30th of box size (px). Only relevant for soft mask.

Common next steps#

Interactively explore correlations between and spatial context of star file parameters, latent embeddings, volume space dimensionality reduction in the tomodrgn analyze Jupyter notebooks
Identify one (or more) sets of particle indices whose particles share a common feature (e.g. in volume space)
Filter the input star file by particle indices with tomodrgn filter_star
Generate an array of numeric labels describing a volume space property for each particle to color volumes in tomogram mapbacks with tomodrgn subtomo2chimerax