tomodrgn filter_star#

Purpose#

Filter a star file by selected particle indices or by selected class labels.

Sample usage#

The examples below are adapted from tomodrgn/testing/commandtest*.py, and rely on other outputs from commandtest.py to execute successfully.

# Warp v1 style inputs -- image series star file filtered by particle indices
tomodrgn \
    filter_star \
    data/10076_both_32_sim.star \
    --starfile-type imageseries \
    --tomo-id-col _rlnImageName \
    --ind data/ind_ptcl_first10last10.pkl \
    -o output/10076_both_32_sim_filtered.star

# Warp v1 style inputs -- image series star file filtered by class labels
tomodrgn \
    filter_star \
    data/10076_both_32_sim.star \
    --starfile-type imageseries \
    --tomo-id-col _rlnImageName \
    --labels data/labels_D-0_E-1.pkl \
    --labels-sel 0 \
    -o output/10076_both_32_sim_filtered_by_labels.star

# Warp v1 style inputs -- volume series star file filtered by class labels
tomodrgn \
    filter_star \
    data/10076_both_32_sim_vols.star \
    --starfile-type volumeseries \
    --tomo-id-col _rlnImageName \
    --labels data/labels_D-0_E-1.pkl \
    --labels-sel 0 1 \
    -o output/10076_both_32_sim_vols_filtered_by_labels.star

# WarpTools style inputs -- filtered by class labels
tomodrgn \
    filter_star \
    data/warptools_test_4-tomos_10-ptcls_box-32_angpix-12_optimisation_set.star \
    --starfile-type optimisation_set \
    --tomo-id-col _rlnTomoName \
    --labels output/vae_warptools_70S_zdim8_dosetiltweightmask_batchsize8/analyze.39/kmeans20/labels.pkl \
    --labels-sel 0 1 2 3 4 \
    -o output/warptools_70S_filtered_by_labels_optimisation_set.star

Arguments#

usage: filter_star [-h]
                   [--starfile-type {imageseries,volumeseries,optimisation_set}]
                   [--action {keep,drop}] [--tomogram TOMOGRAM]
                   [--tomo-id-col TOMO_ID_COL] -o O [--ind IND]
                   [--ind-type {particle,image}] [--labels LABELS]
                   [--labels-sel LABELS_SEL [LABELS_SEL ...]]
                   input

Positional Arguments#

input: Input .star file

Core arguments#

--starfile-type

Possible choices: imageseries, volumeseries, optimisation_set

Type of star file to filter. Select imageseries if rows correspond to particle images. Select volumeseries if rows correspond to particle volumes. Select optimisation_set if passing in an optimisation set star file.

Default: 'imageseries'

--action

Possible choices: keep, drop

keep or remove particles associated with ind.pkl

Default: 'keep'

--tomogram

optionally select by individual tomogram name (if all then writes individual star files per tomogram

--tomo-id-col

Name of column in input starfile with unique values per tomogram

Default: '_rlnMicrographName'

-o

Output .star file (treated as output base name suffixed by tomogram name if specifying –tomogram).The output star file name must contain the string _optimisation_set if the input star file is of –starfile-type optimisation_set

Index-based filtering arguments#

--ind

selected indices array (.pkl)

--ind-type

Possible choices: particle, image

use indices to filter by particle (multiple images) or by image (individual images). Only relevant for imageseries star files filtered using --ind

Default: 'particle'

Class-label-based filtering arguments#

--labels: path to labels array (.pkl). The labels.pkl must contain a 1-D numpy array of integer class labels with length matching the number of particles referenced in the star file to be filtered.
--labels-sel: space-separated list of integer class labels to be selected (to be kept or dropped in accordance with --action)

Common next steps#

Validate that the filtered particle subset is structurally homogeneous for the tomoDRGN-identified feature with tomodrgn backproject_voxel
Export this particle subset to external STA software
Train a new model on this subset of particles with tomodrgn train_vae to explore residual heterogeneity