tomodrgn.starfile.TomoParticlesStarfile#
- class TomoParticlesStarfile(starfile: str, source_software: Literal['auto', 'warptools', 'relion'] = 'auto')[source]#
Bases:
GenericStarfile
Class to parse a particle star file from upstream STA software. The input star file must be an optimisation set star file from e.g. WarpTools, RELION v5. The _rlnTomoParticlesFile referenced in the optimisation set must have each row describing a group of images observing a particular particle. This TomoParticlesStarfile is the object which is immediately loaded, though a reference to the parent optimisation set and related _lnTomoTomogramsFile are also stored (to reference TomoTomogramsStarfile if loading tomogram-level metadata, and to write a new optimisation set of modified the _rlnTomoParticlesFile contents).
Methods
Filter the TomoParticlesStarfile in-place by image indices (e.g., datafram _rlnTomoVisibleFrames column) and particle indices (dataframe rows).
get_image_size
Load the particles referenced in the TomoParticlesStarfile.
Returns the indices of each tilt image and associated metadata relative to the pre-filtered subset of all images of all particles in the star file.
Returns the pixel size of the extracted particles in Ångstroms.
Returns the voltage of the microscope used to image the particles in kV.
Attempt to identify the block_name of the data block within the star file for which rows refer to particle data (as opposed to optics or other data).
Create indices for tilt images assigned to train vs test split.
Plot the distribution of the number of visible tilt images per particle as a line plot (against star file particle index) and as a histogram.
Temporarily removes columns in data_particles dataframe that are present in data_optics dataframe (to restore expected input star file format), then calls parent GenericStarfile write.
Attributes
Shortcut to access the particles dataframe associated with the TomoParticlesStarfile object.
Shortcut to return headers associated with CTF parameters.
Shortcut to return headers associated with rotation parameters.
Shortcut to return headers associated with translation parameters.
- property df: DataFrame#
Shortcut to access the particles dataframe associated with the TomoParticlesStarfile object.
- Returns:
pandas dataframe of particles metadata
- filter(ind_imgs: ndarray | str | None = None, ind_ptcls: ndarray | str | None = None, sort_ptcl_imgs: Literal['unsorted', 'dose_ascending', 'random'] = 'unsorted', use_first_ntilts: int = -1, use_first_nptcls: int = -1) None [source]#
Filter the TomoParticlesStarfile in-place by image indices (e.g., datafram _rlnTomoVisibleFrames column) and particle indices (dataframe rows). Operations are applied in order: ind_img -> ind_ptcl -> sort_ptcl_imgs -> use_first_ntilts -> use_first_nptcls.
- Parameters:
ind_imgs – numpy array or path to numpy array of integer images to preserve, shape (nimgs), Sets values in the _rlnTomoVisibleFrames column to 0 if that image’s index is not in ind_imgs.
ind_ptcls – numpy array or path to numpy array of integer particle indices to preserve, shape (nptcls). Drops particles from the dataframe if that particle’s index is not in ind_ptcls.
sort_ptcl_imgs – sort the star file images on a per-particle basis by the specified criteria. This is primarily useful in combination with
use_first_ntilts
to get the firstntilts
images of each particle after sorting.use_first_ntilts – keep the first use_first_ntilts images (of those images previously marked to be included by _rlnTomoVisibleFrames) of each particle in the sorted star file. Default -1 means to use all. Will drop particles with fewer than this many tilt images.
use_first_nptcls – keep the first use_first_nptcls particles in the sorted star file. Default -1 means to use all.
- Returns:
None
- get_particles_stack(*, datadir: str | None = None, lazy: bool = False, check_headers: bool = False, **kwargs) ndarray | list[LazyImageStack] [source]#
Load the particles referenced in the TomoParticlesStarfile. Particles are loaded into memory directly as a numpy array of shape
(n_images, boxsize+1, boxsize+1)
, or as a list ofmrc.LazyImageStack
objects of lengthn_particles
. The column specifying the path to images on disk must not specify the image index to load from that file (i.e., syntax like1@/path/to/stack.mrcs
is not supported). Instead, specification of which images to load for each particle should be done in the_rlnTomoVisibleFrames
column.- Parameters:
datadir – absolute path to particle images .mrcs to override particles_path_column.
lazy – whether to load particle images now in memory (False) or later on-the-fly (True).
check_headers – whether to parse each file’s header to ensure consistency in dtype and array shape in X,Y (True), or to use the first .mrc(s) file as representative for the dataset (False). Caution that settting
False
is faster, but assumes that the first file’s header is representative of all files.
- Returns:
np.ndarray of shape (n_ptcls * n_tilts, D, D) or list of LazyImage objects of length (n_ptcls * n_tilts)
- get_ptcl_img_indices() list[ndarray] [source]#
Returns the indices of each tilt image and associated metadata relative to the pre-filtered subset of all images of all particles in the star file. Filtering is done using the
self.header_ptcl_visible_frames
column. For example, using the first two dataframe rows of this column as[[1,1,0,1],[1,0,0,1]]
, this method would return indices[np.array([0,1,2]), np.array([3,4])]
. The number of tilt images per particle may vary across the STAR file, so returning a list (or object-type numpy array or ragged torch tensor) is required.- Returns:
integer indices of each tilt image in the particles dataframe grouped by particle ID
- get_tiltseries_pixelsize() float | int [source]#
Returns the pixel size of the extracted particles in Ångstroms. Assumes all particles have the same pixel size.
- Returns:
pixel size in Ångstroms/pixel
- get_tiltseries_voltage() float | int [source]#
Returns the voltage of the microscope used to image the particles in kV.
- Returns:
voltage in kV
- property headers_ctf: list[str]#
Shortcut to return headers associated with CTF parameters.
- Returns:
list of particles dataframe header names for CTF parameters
- property headers_rot: list[str]#
Shortcut to return headers associated with rotation parameters.
- Returns:
list of particles dataframe header names for rotations
- property headers_trans: list[str]#
Shortcut to return headers associated with translation parameters.
- Returns:
list of particles dataframe header names for translations
- identify_particles_data_block(column_substring: str = 'Angle') str #
Attempt to identify the block_name of the data block within the star file for which rows refer to particle data (as opposed to optics or other data).
- Parameters:
column_substring – Search pattern to identify as substring within column name for particles block
- Returns:
the block name of the particles data block (e.g. data or data_particles)
- make_test_train_split(fraction_split1: float = 0.5, show_summary_stats: bool = True) None [source]#
Create indices for tilt images assigned to train vs test split. Images are randomly assigned to one set or the other by precisely respecting fraction_train on a per-particle basis. Random split is stored in self.df under the self.header_image_random_split column as a list of ints in (0, 1, 2) with length self.header_ptcl_visible_frames. These values map as follows:
0: images marked to not include (value 0) in self.header_ptcl_visible_frames.
1: images marked to include (value 1) in self.header_ptcl_visible_frames, assigned to image-level half-set 1
2: images marked to include (value 1) in self.header_ptcl_visible_frames, assigned to image-level half-set 2
- Parameters:
fraction_split1 – fraction of each particle’s included tilt images to label split1. All other included images will be labeled split2.
show_summary_stats – log distribution statistics of particle sampling for test/train splits
- Returns:
None
- plot_particle_uid_ntilt_distribution(outpath: str) None [source]#
Plot the distribution of the number of visible tilt images per particle as a line plot (against star file particle index) and as a histogram.
- Parameters:
outpath – file name to save the plot
- Returns:
None
- write(outstar: str, *args, **kwargs) None [source]#
Temporarily removes columns in data_particles dataframe that are present in data_optics dataframe (to restore expected input star file format), then calls parent GenericStarfile write. Writes both the TomoParticlesStar file and the updated Optimisation Set star file pointing to the new TomoParticlesStar file. The TomoParticlesStar file is written to the same directory as the optimisation set star file, and has the same name as the optimisation set after removing the string
_optimisation_set
.- Parameters:
outstar – name of the output optimisation set star file, optionally as absolute or relative path. Filename should include the string
_optimisation_set
, e.g.run_optimisation_set.star
.args – Passed to parent GenericStarfile write
kwargs – Passed to parent GenericStarfile write
- Returns:
None