tomodrgn.starfile.TiltSeriesStarfile#

class TiltSeriesStarfile(starfile: str, source_software: Literal['auto', 'warp', 'cryosrpnt', 'nextpyp', 'cistem'] = 'auto')[source]#

Bases: GenericStarfile

Class to parse a particle image-series star file from upstream STA software. Each row in the star file must describe an individual image of a particle; groups of related rows describe all images observing one particle.

Methods

filter

Filter the TiltSeriesStarfile in-place by image indices (rows) and particle indices (groups of rows corresponding to the same particle).

get_image_size

Returns the image size in pixels by loading the first image's header.

get_particles_stack

Calls parent GenericStarfile get_particles_stack.

get_ptcl_img_indices

Returns the indices of each tilt image in the particles dataframe grouped by particle ID.

get_tiltseries_pixelsize

Returns the pixel size of the extracted particles in Ångstroms.

get_tiltseries_voltage

Returns the voltage of the microscope used to image the particles in kV.

identify_particles_data_block

Attempt to identify the block_name of the data block within the star file for which rows refer to particle data (as opposed to optics or other data).

make_test_train_split

Create indices for tilt images assigned to train vs test split.

plot_particle_uid_ntilt_distribution

Plot the distribution of the number of tilt images per particle as a line plot (against star file particle index) and as a histogram.

write

Temporarily removes columns in data_particles dataframe that are present in data_optics dataframe (to restore expected input star file format), then calls parent GenericStarfile write.

Attributes

df

Shortcut to access the particles dataframe associated with the TiltSeriesStarfile object.

headers_ctf

Shortcut to return headers associated with CTF parameters.

headers_rot

Shortcut to return headers associated with rotation parameters.

headers_trans

Shortcut to return headers associated with translation parameters.

property df: DataFrame#

Shortcut to access the particles dataframe associated with the TiltSeriesStarfile object.

Returns:

pandas dataframe of particles metadata

filter(ind_imgs: ndarray | str | None = None, ind_ptcls: ndarray | str | None = None, sort_ptcl_imgs: Literal['unsorted', 'dose_ascending', 'random'] = 'unsorted', use_first_ntilts: int = -1, use_first_nptcls: int = -1) None[source]#

Filter the TiltSeriesStarfile in-place by image indices (rows) and particle indices (groups of rows corresponding to the same particle). Operations are applied in order: ind_img -> ind_ptcl -> sort_ptcl_imgs -> use_first_ntilts -> use_first_nptcls.

Parameters:
  • ind_imgs – numpy array or path to numpy array of integer row indices to preserve, shape (N)

  • ind_ptcls – numpy array or path to numpy array of integer particle indices to preserve, shape (N)

  • sort_ptcl_imgs – sort the star file images on a per-particle basis by the specified criteria

  • use_first_ntilts – keep the first use_first_ntilts images of each particle in the sorted star file. Default -1 means to use all. Will drop particles with fewer than this many tilt images.

  • use_first_nptcls – keep the first use_first_nptcls particles in the sorted star file. Default -1 means to use all.

Returns:

None

get_image_size(datadir: str | None = None) int[source]#

Returns the image size in pixels by loading the first image’s header. Assumes images are square.

Parameters:

datadir – Relative or absolute path to overwrite path to particle image .mrcs specified in the STAR file

Returns:

image size in pixels

get_particles_stack(*, datadir: str | None = None, lazy: bool = False, **kwargs) ndarray | list[LazyImage][source]#

Calls parent GenericStarfile get_particles_stack. Parent method parameters particles_block_name and particles_path_column are presupplied due to identification of these values during TiltSeriesStarfile instance creation.

Parameters:
  • datadir – absolute path to particle images .mrcs to override particles_path_column

  • lazy – whether to load particle images now in memory (False) or later on-the-fly (True)

Returns:

np.ndarray of shape (n_ptcls * n_tilts, D, D) or list of LazyImage objects of length (n_ptcls * n_tilts)

get_ptcl_img_indices() list[ndarray[int]][source]#

Returns the indices of each tilt image in the particles dataframe grouped by particle ID. The number of tilt images per particle may vary across the STAR file, so a list (or object-type numpy array or ragged torch tensor) is required

Returns:

indices of each tilt image in the particles dataframe grouped by particle ID

get_tiltseries_pixelsize() float | int[source]#

Returns the pixel size of the extracted particles in Ångstroms. Assumes all particles have the same pixel size.

Returns:

pixel size in Ångstroms/pixel

get_tiltseries_voltage() float | int[source]#

Returns the voltage of the microscope used to image the particles in kV.

Returns:

voltage in kV

property headers_ctf: list[str]#

Shortcut to return headers associated with CTF parameters.

Returns:

list of particles dataframe header names for CTF parameters

property headers_rot: list[str]#

Shortcut to return headers associated with rotation parameters.

Returns:

list of particles dataframe header names for rotations

property headers_trans: list[str]#

Shortcut to return headers associated with translation parameters.

Returns:

list of particles dataframe header names for translations

identify_particles_data_block(column_substring: str = 'Angle') str#

Attempt to identify the block_name of the data block within the star file for which rows refer to particle data (as opposed to optics or other data).

Parameters:

column_substring – Search pattern to identify as substring within column name for particles block

Returns:

the block name of the particles data block (e.g. data or data_particles)

make_test_train_split(fraction_split1: float = 0.5, show_summary_stats: bool = True) None[source]#

Create indices for tilt images assigned to train vs test split. Images are randomly assigned to one set or the other by respecting fraction_train on a per-particle basis. Random split is stored in self.df under the self.header_image_random_split column.

Parameters:
  • fraction_split1 – fraction of each particle’s tilt images to label split1. All others will be labeled split2.

  • show_summary_stats – log distribution statistics of particle sampling for test/train splits

Returns:

None

plot_particle_uid_ntilt_distribution(outpath: str) None[source]#

Plot the distribution of the number of tilt images per particle as a line plot (against star file particle index) and as a histogram.

Parameters:

outpath – file name to save the plot

Returns:

None

write(*args, **kwargs) None[source]#

Temporarily removes columns in data_particles dataframe that are present in data_optics dataframe (to restore expected input star file format), then calls parent GenericStarfile write.

Parameters:
  • args – Passed to parent GenericStarfile write

  • kwargs – Passed to parent GenericStarfile write

Returns:

None