tomodrgn filter_star#
Purpose#
Filter a star file by selected particle indices or by selected class labels.
Sample usage#
The examples below are adapted from tomodrgn/testing/commandtest*.py
, and rely on other outputs from commandtest.py
to execute successfully.
# Warp v1 style inputs -- image series star file filtered by particle indices
tomodrgn \
filter_star \
data/10076_both_32_sim.star \
--starfile-type imageseries \
--tomo-id-col _rlnImageName \
--ind data/ind_ptcl_first10last10.pkl \
-o output/10076_both_32_sim_filtered.star
# Warp v1 style inputs -- image series star file filtered by class labels
tomodrgn \
filter_star \
data/10076_both_32_sim.star \
--starfile-type imageseries \
--tomo-id-col _rlnImageName \
--labels data/labels_D-0_E-1.pkl \
--labels-sel 0 \
-o output/10076_both_32_sim_filtered_by_labels.star
# Warp v1 style inputs -- volume series star file filtered by class labels
tomodrgn \
filter_star \
data/10076_both_32_sim_vols.star \
--starfile-type volumeseries \
--tomo-id-col _rlnImageName \
--labels data/labels_D-0_E-1.pkl \
--labels-sel 0 1 \
-o output/10076_both_32_sim_vols_filtered_by_labels.star
# WarpTools style inputs -- filtered by class labels
tomodrgn \
filter_star \
data/warptools_test_4-tomos_10-ptcls_box-32_angpix-12_optimisation_set.star \
--starfile-type optimisation_set \
--tomo-id-col _rlnTomoName \
--labels output/vae_warptools_70S_zdim8_dosetiltweightmask_batchsize8/analyze.39/kmeans20/labels.pkl \
--labels-sel 0 1 2 3 4 \
-o output/warptools_70S_filtered_by_labels_optimisation_set.star
Arguments#
usage: filter_star [-h]
[--starfile-type {imageseries,volumeseries,optimisation_set}]
[--action {keep,drop}] [--tomogram TOMOGRAM]
[--tomo-id-col TOMO_ID_COL] -o O [--ind IND]
[--ind-type {particle,image}] [--labels LABELS]
[--labels-sel LABELS_SEL [LABELS_SEL ...]]
input
Positional Arguments#
- input
Input .star file
Core arguments#
- --starfile-type
Possible choices: imageseries, volumeseries, optimisation_set
Type of star file to filter. Select imageseries if rows correspond to particle images. Select volumeseries if rows correspond to particle volumes. Select optimisation_set if passing in an optimisation set star file.
Default:
'imageseries'
- --action
Possible choices: keep, drop
keep or remove particles associated with ind.pkl
Default:
'keep'
- --tomogram
optionally select by individual tomogram name (if all then writes individual star files per tomogram
- --tomo-id-col
Name of column in input starfile with unique values per tomogram
Default:
'_rlnMicrographName'
- -o
Output .star file (treated as output base name suffixed by tomogram name if specifying –tomogram).The output star file name must contain the string _optimisation_set if the input star file is of –starfile-type optimisation_set
Index-based filtering arguments#
- --ind
selected indices array (.pkl)
- --ind-type
Possible choices: particle, image
use indices to filter by particle (multiple images) or by image (individual images). Only relevant for imageseries star files filtered using
--ind
Default:
'particle'
Class-label-based filtering arguments#
- --labels
path to labels array (.pkl). The labels.pkl must contain a 1-D numpy array of integer class labels with length matching the number of particles referenced in the star file to be filtered.
- --labels-sel
space-separated list of integer class labels to be selected (to be kept or dropped in accordance with
--action
)
Common next steps#
Validate that the filtered particle subset is structurally homogeneous for the tomoDRGN-identified feature with
tomodrgn backproject_voxel
Export this particle subset to external STA software
Train a new model on this subset of particles with
tomodrgn train_vae
to explore residual heterogeneity