tomodrgn convergence_vae#

Purpose#

Evaluate convergence of a train_vae model by stability of latent space and reconstructed volumes.

Sample usage#

The examples below are adapted from tomodrgn/testing/commandtest*.py, and rely on other outputs from commandtest.py to execute successfully.

# Warp v1 style inputs
tomodrgn \
    convergence_vae \
    output/vae_both_sim_zdim8_dosetiltweightmask_batchsize8 \
    --final-maxima 2 \
    --ground-truth data/10076_class*_32.mrc

# WarpTools style inputs
tomodrgn \
    convergence_vae \
    output/vae_warptools_70S_zdim8_dosetiltweightmask_batchsize8 \
    --final-maxima 2 \
    --ground-truth data/warptools_test_box-32_angpix-12_reconstruct.mrc

Arguments#

usage: convergence_vae [-h] [--epoch EPOCH] [-o OUTDIR]
                       [--epoch-interval EPOCH_INTERVAL]
                       [--plot-format {png,svgz}] [--subset SUBSET]
                       [--random-seed RANDOM_SEED]
                       [--random-state RANDOM_STATE] [--skip-umap]
                       [--n-bins N_BINS] [--smooth SMOOTH]
                       [--smooth-width SMOOTH_WIDTH]
                       [--pruned-maxima PRUNED_MAXIMA] [--radius RADIUS]
                       [--final-maxima FINAL_MAXIMA] [--downsample DOWNSAMPLE]
                       [--lowpass LOWPASS] [--flip] [--invert] [--skip-volgen]
                       [--ground-truth GROUND_TRUTH [GROUND_TRUTH ...]]
                       [--mask {none,sphere,tight,soft}] [--thresh THRESH]
                       [--dilate DILATE] [--dist DIST]
                       workdir

Positional Arguments#

workdir

Directory with tomoDRGN results

Named Arguments#

--epoch

Latest epoch number N to analyze convergence (0-based indexing, corresponding to weights.N.pkl), “latest” for last detected epoch

Default: 'latest'

Core arguments#

-o, --outdir

Output directory for convergence analysis results (default: [workdir]/convergence.[epoch])

--epoch-interval

Interval of epochs between calculating most convergence heuristics

Default: 5

--plot-format

Possible choices: png, svgz

File format with which to save plots

Default: 'png'

UMAP calculation arguments#

--subset

Max number of particles to be used for UMAP calculations. ‘None’ means use all ptcls

Default: 50000

--random-seed

Manually specify the seed used for selection of subset particles and other numpy operations

Default: 58600

--random-state

Random state seed used by UMAP for reproducibility at slight cost of performance (default 42, None means slightly faster but non-reproducible)

Default: 42

--skip-umap

Skip UMAP embedding. Requires that UMAP be precomputed for downstream calcs. Useful for tweaking volume generation settings.

Default: False

Sketching UMAP via local maxima arguments#

--n-bins

the number of bins along UMAP1 and UMAP2

Default: 30

--smooth

smooth the 2D histogram before identifying local maxima

Default: True

--smooth-width

width of gaussian kernel for smoothing 2D histogram expressed as multiple of one bin’s width

Default: 1.0

--pruned-maxima

prune poorly-separated maxima until this many maxima remain

Default: 12

--radius

distance at which two maxima are considered poorly-separated and are candidates for pruning (euclidean distance in bin-space)

Default: 5.0

--final-maxima

select this many local maxima, sorted by highest bin count after pruning, for which to generate volumes

Default: 10

Volume generation arguments#

--downsample

Downsample volumes to this box size (pixels)

--lowpass

Lowpass filter to this resolution in Å

--flip

Flip handedness of output volumes

Default: False

--invert

Invert contrast of output volumes

Default: False

--skip-volgen

Skip volume generation. Requires that volumes already exist for downstream CC + FSC calculations

Default: False

--ground-truth

Relative path containing wildcards to ground_truth_vols*.mrc for map-map CC calcs

Mask generation arguments#

--mask

Possible choices: none, sphere, tight, soft

Type of mask to generate for each volume when calculating volume-based metrics.

Default: 'soft'

--thresh

Isosurface percentile at which to threshold volume; default is to use 99th percentile.

--dilate

Number of voxels to dilate thresholded isosurface outwards from mask boundary; default is to use 1/30th of box size (px).

--dist

Number of voxels over which to apply a soft cosine falling edge from dilated mask boundary; default is to use 1/30th of box size (px)

Common next steps#

  • Extend model training with tomodrgn train_vae [...] --load latest if not yet converged

  • Analyze model at a particular epoch in latent space with tomodrgn analyze

  • Analyze model at a particular epoch in volume space with tomodrgn analyze_volumes