interferences.table.build

interferences.table.build.build_table(elements=None, max_atoms=3, sortby=['m_z', 'charge', 'mass'], charges=[1, 2], add_labels=False, threshold=None, window=None, cache_results=True)[source]

Build the interferences table.

Parameters
  • elements (list) – List of elements to include in the table.

  • max_atoms (int) – Largest size of molecule to build, in atoms.

  • sortby (str | list) – Column or list of columns to sort the final table by.

  • charges (list ( int )) – Ionic charges to include in the model.

  • add_labels (bool) – Whether to produce molecule names which are nicely formatted. This takes additional computation time.

  • threshold (float) – Threshold for isotopic abundance for inclusion of low-abudance/non-stable isotopes.

  • mass_window (tuple) – Window of interest to filter out irrelevant examples (here a mass window, which directly translates to m/z window with z=1).

  • cache_results (bool) – Whether to store the results on disk for

Todo

Consider options for parellizing this to reduce build time. This would allow larger molecules to be included.

Invalid molecules (e.g. H{2+}) will currently be present, but will ideally be filtered out

In some cases, mass peaks will be duplicated, and we want to keep the simplest version (e.g. Ar[40]+ and Ar[40]2{2+}). We here remove duplicate mass peaks before sorting (i.e. take the first one, as higher charges would be penalised), but we could potentially add a check that both contain the same isotopic components for verificaiton (this would be slow..).

While “m/z” would be an appropriate column name, it can’t be used in HDF indexes.

interferences.table.combinations

Functions for calculating combinations (in the combinatorics sense) of elements and isotopes into isotope-specified molecular ions.

interferences.table.combinations.get_elemental_combinations(elements, max_atoms=3)[source]

Combine a list of elements into lists of molecular combinations up to a maximum number of atoms per molecule. Successively adds smaller molecules until down to single atoms.

Parameters
  • elements (list) – Elements or isotopes to combine into molecules.

  • max_atoms (int) – Maximum number of atoms per molecule. This limits the number of molecules returned to the generally most relevant simple molecules.

Todo

Check that isotopes supplied to this function are propogated

interferences.table.combinations.get_isotopic_combinations(element_comb, threshold=None)[source]

Take a combination of elements and expand it to generate the potential combinations of elements.

Parameters
  • element_comb (list) – List of elements for which to combine lists of isotopes.

  • threshold (float) – Threshold below which to ignore low-abundance isotopes.

Return type

list

interferences.table.combinations.component_subtable(components, charges=[1, 2], threshold=None)[source]

Build a sub-table from a set of elemental components.

Parameters
  • components (list) – List of elements to combine in the subtable.

  • charges (list ( int )) – Ionic charges to include in the model.

  • threshold (float) – Threshold for isotopic abundance for inclusion of low-abudance/non-stable isotopes.

Return type

pandas.DataFrame

interferences.table.intensity

Functions to threshold, combine and estimate intensities of elements and isotopes based on their abundances.

interferences.table.intensity.isotope_abundance_threshold(isotopes, threshold=None)[source]

Remove isotopes from a list which have no or zero abundance.

Parameters
  • isotopes (list) – List of isotopes to filter.

  • threshold (float) – Minimum isotope abundance for inclusion.

Return type

list

interferences.table.intensity.get_isotopic_abundance_product(components)[source]

Estimates the abundance of a molecule based on the abundance of the isotopic components.

Return type

float

Notes

This is essentially a simplistic activity model. Isotopic abundances from periodictable are in %, and are hence divded by 100 here.

interferences.table.molecules

Functions for creating, formatting and serialising representaitons of molecules.

interferences.table.molecules.components_from_index_value(idx)[source]
interferences.table.molecules.deduplicate(df, charges=None, multiples=True)[source]

De-duplicate a dataframe index based on index values and and molecule-multiples.

Parameters
  • df (pandas.DataFrame) – Dataframe to check the index of.

  • charges (list) – List of valid charges for the frame.

  • multiples (bool) – Whether to remove molecule-multiples.

Return type

pandas.DataFrame

interferences.table.molecules.repr_formula(molecule)[source]

Get a string representation of a formula which preserves element and isotope information.

interferences.table.molecules.get_formatted_formula(molecule, sorted=False)[source]

Construct a formatted name for a molecule.

Parameters
  • molecule (Formula) – Molecule to name.

  • sorted (bool) – Whether a molecular formula is already sorted, so sorting can be skipped.

Return type

str

interferences.table.molecules.get_molecule_labels(df, **kwargs)[source]

Get labels for molecules based on their composition and charge.

Parameters

df (pandas.DataFrame)

Return type

pandas.Series

interferences.table.molecules.molecule_from_components(components)[source]

Builds a Formula from a list of atom or isotope components.

Parameters

components (list) – Atomic, isotope or molecular components to construct an ionic molecule from.

Return type

Formula

Todo

  • Modify to accept consumption of molecular components (e.g. Fe2O3+)

See also

pyrolite.mineral.transform.merge_formulae()

interferences.table.store

interferences.table.store.load_store(path=None, complevel=4, complib='lzo', **kwargs)[source]

Load the interferences HDF store.

Parameters
  • path (str | pathlib.Path) – Path to the store.

  • complevel (int) – Compression level option for the HDF store. Uncompressed tables can easily reach a few hundred MB - this isn’t an issue on a local disk, but can be limiting for web transfer.

  • complib (str) – Which compression library to use.

Return type

pandas.HDFStore

interferences.table.store.lookup_components(identifier, path=None, key='table', window=None, **kwargs)[source]

Look up a a list of components from the store based on their identifiers.

Parameters
  • identifiers (str) – Identifiers for the components to look up.

  • path (str | pathlib.Path) – Path to store to search.

  • key (str) – Key for the table within the store.

  • window (tuple) – Window for indexing along m/z to return a subset of results.

  • drop_first_level (bool) – Whether to drop the first level of the index for simplicity.

Return type

pandas.DataFrame

interferences.table.store.get_store_index(path, drop_first_level=True, **kwargs)[source]
interferences.table.store.process_subtables(dfs, charges=None, dump=True, path=None, mode='a', data_columns=['parts', 'elements', 'm_z', 'iso_abund_product'], complevel=4, complib='lzo', **kwargs)[source]

Process and optionally dump a set of subtables to file, appending to the hierarchically-indexed table.

Parameters
  • dfs (list`(:class:`pandas.DataFrame)) – Dataframes to dump.

  • charges (list) – Charges used to create for the table.

  • path (str | pathlib.Path) – Path to the file to add the table to.

  • mode (str) – Mode for accessing the HDF file.

  • data_columns (list) – List of columns to create an indexes for to allow query-by-data.

  • complevel (int) – Compression level option for the HDF store. Uncompressed tables can easily reach a few hundred MB - this isn’t an issue on a local disk, but can be limiting for web transfer.

  • complib (str) – Which compression library to use.

Returns

De-duplicated concatenated version of new tables.

Return type

pandas.DataFrame

interferences.table.store.reset_table(path=None, remove=True, key='table', format='table', complevel=4, complib='lzo', **kwargs)[source]

Reset or remove a HDF store.

Parameters
  • path (str | pathlib.Path) – Path to store.

  • remove (bool) – Whether to remove the table from disk, if possible.

  • format (str) – Format to set for the new tables.

  • complevel (int) – Compression level option for the HDF store. Uncompressed tables can easily reach a few hundred MB - this isn’t an issue on a local disk, but can be limiting for web transfer.

  • complib (str) – Which compression library to use.