tomodrgn.starfile.TiltSeriesStarfile#
- class TiltSeriesStarfile(starfile: str, source_software: Literal['auto', 'warp', 'cryosrpnt', 'nextpyp', 'cistem'] = 'auto')[source]#
Bases:
GenericStarfile
Class to parse a particle image-series star file from upstream STA software. Each row in the star file must describe an individual image of a particle; groups of related rows describe all images observing one particle.
Methods
Filter the TiltSeriesStarfile in-place by image indices (rows) and particle indices (groups of rows corresponding to the same particle).
Returns the image size in pixels by loading the first image's header.
Calls parent GenericStarfile get_particles_stack.
Returns the indices of each tilt image in the particles dataframe grouped by particle ID.
Returns the pixel size of the extracted particles in Ångstroms.
Returns the voltage of the microscope used to image the particles in kV.
Attempt to identify the block_name of the data block within the star file for which rows refer to particle data (as opposed to optics or other data).
Create indices for tilt images assigned to train vs test split.
Plot the distribution of the number of tilt images per particle as a line plot (against star file particle index) and as a histogram.
Temporarily removes columns in data_particles dataframe that are present in data_optics dataframe (to restore expected input star file format), then calls parent GenericStarfile write.
Attributes
Shortcut to access the particles dataframe associated with the TiltSeriesStarfile object.
Shortcut to return headers associated with CTF parameters.
Shortcut to return headers associated with rotation parameters.
Shortcut to return headers associated with translation parameters.
- property df: DataFrame#
Shortcut to access the particles dataframe associated with the TiltSeriesStarfile object.
- Returns:
pandas dataframe of particles metadata
- filter(ind_imgs: ndarray | str | None = None, ind_ptcls: ndarray | str | None = None, sort_ptcl_imgs: Literal['unsorted', 'dose_ascending', 'random'] = 'unsorted', use_first_ntilts: int = -1, use_first_nptcls: int = -1) None [source]#
Filter the TiltSeriesStarfile in-place by image indices (rows) and particle indices (groups of rows corresponding to the same particle). Operations are applied in order: ind_img -> ind_ptcl -> sort_ptcl_imgs -> use_first_ntilts -> use_first_nptcls.
- Parameters:
ind_imgs – numpy array or path to numpy array of integer row indices to preserve, shape (N)
ind_ptcls – numpy array or path to numpy array of integer particle indices to preserve, shape (N)
sort_ptcl_imgs – sort the star file images on a per-particle basis by the specified criteria
use_first_ntilts – keep the first use_first_ntilts images of each particle in the sorted star file. Default -1 means to use all. Will drop particles with fewer than this many tilt images.
use_first_nptcls – keep the first use_first_nptcls particles in the sorted star file. Default -1 means to use all.
- Returns:
None
- get_image_size(datadir: str | None = None) int [source]#
Returns the image size in pixels by loading the first image’s header. Assumes images are square.
- Parameters:
datadir – Relative or absolute path to overwrite path to particle image .mrcs specified in the STAR file
- Returns:
image size in pixels
- get_particles_stack(*, datadir: str | None = None, lazy: bool = False, **kwargs) ndarray | list[LazyImage] [source]#
Calls parent GenericStarfile get_particles_stack. Parent method parameters particles_block_name and particles_path_column are presupplied due to identification of these values during TiltSeriesStarfile instance creation.
- Parameters:
datadir – absolute path to particle images .mrcs to override particles_path_column
lazy – whether to load particle images now in memory (False) or later on-the-fly (True)
- Returns:
np.ndarray of shape (n_ptcls * n_tilts, D, D) or list of LazyImage objects of length (n_ptcls * n_tilts)
- get_ptcl_img_indices() list[ndarray[int]] [source]#
Returns the indices of each tilt image in the particles dataframe grouped by particle ID. The number of tilt images per particle may vary across the STAR file, so a list (or object-type numpy array or ragged torch tensor) is required
- Returns:
indices of each tilt image in the particles dataframe grouped by particle ID
- get_tiltseries_pixelsize() float | int [source]#
Returns the pixel size of the extracted particles in Ångstroms. Assumes all particles have the same pixel size.
- Returns:
pixel size in Ångstroms/pixel
- get_tiltseries_voltage() float | int [source]#
Returns the voltage of the microscope used to image the particles in kV.
- Returns:
voltage in kV
- property headers_ctf: list[str]#
Shortcut to return headers associated with CTF parameters.
- Returns:
list of particles dataframe header names for CTF parameters
- property headers_rot: list[str]#
Shortcut to return headers associated with rotation parameters.
- Returns:
list of particles dataframe header names for rotations
- property headers_trans: list[str]#
Shortcut to return headers associated with translation parameters.
- Returns:
list of particles dataframe header names for translations
- identify_particles_data_block(column_substring: str = 'Angle') str #
Attempt to identify the block_name of the data block within the star file for which rows refer to particle data (as opposed to optics or other data).
- Parameters:
column_substring – Search pattern to identify as substring within column name for particles block
- Returns:
the block name of the particles data block (e.g. data or data_particles)
- make_test_train_split(fraction_split1: float = 0.5, show_summary_stats: bool = True) None [source]#
Create indices for tilt images assigned to train vs test split. Images are randomly assigned to one set or the other by respecting fraction_train on a per-particle basis. Random split is stored in self.df under the self.header_image_random_split column.
- Parameters:
fraction_split1 – fraction of each particle’s tilt images to label split1. All others will be labeled split2.
show_summary_stats – log distribution statistics of particle sampling for test/train splits
- Returns:
None
- plot_particle_uid_ntilt_distribution(outpath: str) None [source]#
Plot the distribution of the number of tilt images per particle as a line plot (against star file particle index) and as a histogram.
- Parameters:
outpath – file name to save the plot
- Returns:
None
- write(*args, **kwargs) None [source]#
Temporarily removes columns in data_particles dataframe that are present in data_optics dataframe (to restore expected input star file format), then calls parent GenericStarfile write.
- Parameters:
args – Passed to parent GenericStarfile write
kwargs – Passed to parent GenericStarfile write
- Returns:
None