pygauss.analysis module¶
-
class
pygauss.analysis.
Analysis
(folderpath='', server=None, username=None, passwrd=None, folder_obj=None, headers=[])[source]¶ Bases:
object
a class to analyse multiple computations
Parameters: - folderpath (str) – the folder directory storing the files to be analysed
- server (str) – the name of the server storing the files to be analysed
- username (str) – the username to connect to the server
- passwrd (str) – server password, if not present it will be asked for during initialisation
- headers (list) – the variable categories for each computation
-
add_basic_properties
(props=['basis', 'nbasis', 'optimised', 'conformer'])[source]¶ adds columns giving info of basic run properties
-
add_mol_property
(name, method, *args, **kwargs)[source]¶ compute molecule property for all rows and create a data column
Parameters:
-
add_mol_property_subset
(name, method, rows=[], filters={}, args=[], kwargs={}, relative_to_rows=[])[source]¶ compute molecule property for a subset of rows and create/add-to data column
Parameters: - name (str or list of strings) – name for output column (multiple if method outputs more than one value)
- method (str) – what molecule method to call
- rows (list) – what molecule rows to calculate the property for
- filters (dict) – filter for selecting molecules to calculate the property for
- args (list) – the arguments to pass to the molecule method
- kwargs (dict) – the keyword arguments to pass to the molecule method
- relative_to_rows (list of ints) – compute values relative to the summated value(s) of molecule at the rows listed
-
add_run
(identifiers={}, init_fname=None, opt_fname=None, freq_fname=None, nbo_fname=None, alignto=[], atom_groups={}, add_if_error=False, folder_obj=None)[source]¶ add single Gaussian run input/outputs
-
add_runs
(headers=[], values=[], init_pattern=None, opt_pattern=None, freq_pattern=None, nbo_pattern=None, add_if_error=False, alignto=[], atom_groups={}, ipython_print=False)[source]¶ add multiple Gaussian run inputs/outputs
-
calc_kmean_groups
(category_column, category_name, groups, columns=[], rows=[], filters={})[source]¶ calculate the kmeans grouping of rows
The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.
-
folder
¶ The folder for gaussian runs
-
get_basic_property
(prop, *args, **kwargs)[source]¶ returns a series of a basic run property or nan if it is not available
Parameters: prop (str) – can be ‘basis’, ‘nbasis’, ‘optimised’, ‘opt_error’ or ‘conformer’
-
get_freq_analysis
(info_columns=[], rows=[], filters={})[source]¶ return frequency analysis
Parameters: - info_columns (list of str) – columns to use as info in caption
- rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
- filters (dict) – {columns:values} to filter by
Returns: data – frequency data
Return type: pd.DataFrame
-
get_table
(rows=[], columns=[], filters={}, precision=4, head=False, mol=False, row_index=[], column_index=[], as_image=False, na_rep='-', font_size=None, width=None, height=None, unconfined=False)[source]¶ return pandas table of requested data in requested format
- rows : integer or list of integers
- select row ids
- columns : string/integer or list of strings/integers
- select column names/positions
- filters : dict
- filter for rows with certain value(s) in specific columns
- precision : int
- decimal precision of displayed values
- head : int
- return only first n rows
- mol : bool
- include column containing the molecule objects
- row_index : string or list of strings
- columns to use as new index
- column_index : list of strings
- srings to place in to higher order column indexs
- as_image : bool
- output the table as an image (used pygauss.utils.df_to_img)
- na_rep : str
- how to represent empty (nan) cells (if outputting image)
- width, height, unconfined : int, int, bool
- args for IPy Image
-
plot_freq_analysis
(info_columns=[], rows=[], filters={}, share_plot=True, include_row=False)[source]¶ plot frequency analysis
Parameters: - info_columns (list of str) – columns to use as info in caption
- rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
- filters (dict) – {columns:values} to filter by
- share_plot (bool) – whether to share a single plot or have multiple ones
- include_row (bool) – include row number in legend labels
Returns: data – plotted frequency data
Return type: matplotlib.figure.Figure
-
plot_mol_images
(mtype='optimised', info_columns=[], info_incl_id=False, max_cols=1, label_size=20, start_letter='A', save_fname=None, rows=[], filters={}, align_to=[], rotations=[[0.0, 0.0, 0.0]], gbonds=True, represent='ball_stick', zoom=1.0, width=500, height=500, axis_length=0, relative=False, minval=-1, maxval=1, highlight=[], frame_on=False, eunits='kJmol-1', sopt_min_energy=20.0, sopt_cutoff_energy=0.0, atom_groups=[], alpha=0.5, transparent=False, hbondwidth=5, no_hbonds=False)[source]¶ show molecules in matplotlib table of axes
Parameters: - mtype – ‘initial’, ‘optimised’, ‘nbo’, ‘highlight’, ‘sopt’ or ‘hbond’
- info_columns (list of str) – columns to use as info in caption
- info_incl_id (bool) – include molecule id number in caption
- max_cols (int) – maximum columns in plot
- label_size (int) – subplot label size (pts)
- start_letter (str) – starting (capital) letter for labelling subplots
- save_fname (str) – name of file, if you wish to save the plot to file
- rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
- filters (dict) – {columns:values} to filter by
- align_to ([int, int, int]) – align geometries to the plane containing these atoms
- rotations (list of [float, float, float]) – for each rotation set [x,y,z] an image will be produced
- gbonds (bool) – guess bonds between atoms (via distance)
- represent (str) – representation of molecule (‘none’, ‘wire’, ‘vdw’ or ‘ball_stick’)
- zoom (float) – zoom level of images
- width (int) – width of original images
- height (int) – height of original images (although width takes precedent)
- axis_length (float) – length of x,y,z axes in negative and positive directions
- relative (bool) – coloring of nbo atoms scaled to min/max values in atom set (for nbo mtype)
- minval (float) – coloring of nbo atoms scaled to absolute min (for nbo mtype)
- maxval (float) – coloring of nbo atoms scaled to absolute max (for nbo mtype)
- highlight (list of lists) – atom indxes to highlight (for highlight mtype)
- eunits (str) – the units of energy to return (for sopt/hbond mtype)
- sopt_min_energy (float) – minimum energy to show (for sopt/hbond mtype)
- sopt_cutoff_energy (float) – energy below which bonds will be dashed (for sopt mtype)
- alpha (float) – alpha color value of geometry (for sopt/hbond mtypes)
- transparent (bool) – whether atoms should be transparent (for sopt/hbond mtypes)
- hbondwidth (float) – width of lines depicting interaction (for hbond mtypes)
- atom_groups ([list or str, list or str]) – restrict interactions to between two lists (or identifiers) of atom indexes (for sopt/hbond mtypes)
- no_hbonds (bool) – whether to ignore H-Bonds in the calculation (for sopt only)
- frame_on (bool) – whether to show frame around each image
Returns: - fig (matplotlib.figure.Figure) – A figure containing subplots for each molecule image
- caption (str) – A caption describing each subplot, given info_columns
-
plot_radviz_comparison
(category_column, columns=[], rows=[], filters={}, point_size=30, **kwargs)[source]¶ return plot axis of radviz graph
RadViz is a way of visualizing multi-variate data. It is based on a simple spring tension minimization algorithm. Basically you set up a bunch of points in a plane. In our case they are equally spaced on a unit circle. Each point represents a single attribute. You then pretend that each sample in the data set is attached to each of these points by a spring, the stiffness of which is proportional to the numerical value of that attribute (they are normalized to unit interval). The point in the plane, where our sample settles to (where the forces acting on our sample are at an equilibrium) is where a dot representing our sample will be drawn. Depending on which class that sample belongs it will be colored differently.
-
remove_rows
(rows)[source]¶ remove one or more rows of molecules
- rows : int or list of ints:
- the rows to remove
-
yield_mol_images
(rows=[], filters={}, mtype='optimised', align_to=[], rotations=[[0.0, 0.0, 0.0]], gbonds=True, represent='ball_stick', zoom=1.0, width=300, height=300, axis_length=0, relative=False, minval=-1, maxval=1, highlight=[], active=False, sopt_min_energy=20.0, sopt_cutoff_energy=0.0, atom_groups=[], alpha=0.5, transparent=False, hbondwidth=5, eunits='kJmol-1', no_hbonds=False, ipyimg=True)[source]¶ yields molecules
Parameters: - mtype – ‘initial’, ‘optimised’, ‘nbo’, ‘highlight’, ‘sopt’ or ‘hbond’
- info_columns (list of str) – columns to use as info in caption
- max_cols (int) – maximum columns in plot
- label_size (int) – subplot label size (pts)
- start_letter (str) – starting (capital) letter for labelling subplots
- save_fname (str) – name of file, if you wish to save the plot to file
- rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
- filters (dict) – {columns:values} to filter by
- align_to ([int, int, int]) – align geometries to the plane containing these atoms
- rotations (list of [float, float, float]) – for each rotation set [x,y,z] an image will be produced
- gbonds (bool) – guess bonds between atoms (via distance)
- represent (str) – representation of molecule (‘none’, ‘wire’, ‘vdw’ or ‘ball_stick’)
- zoom (float) – zoom level of images
- width (int) – width of original images
- height (int) – height of original images (although width takes precedent)
- axis_length (float) – length of x,y,z axes in negative and positive directions
- relative (bool) – coloring of nbo atoms scaled to min/max values in atom set (for nbo mtype)
- minval (float) – coloring of nbo atoms scaled to absolute min (for nbo mtype)
- maxval (float) – coloring of nbo atoms scaled to absolute max (for nbo mtype)
- highlight (list of lists) – atom indxes to highlight (for highlight mtype)
- eunits (str) – the units of energy to return (for sopt/hbond mtype)
- sopt_min_energy (float) – minimum energy to show (for sopt/hbond mtype)
- sopt_cutoff_energy (float) – energy below which bonds will be dashed (for sopt mtype)
- alpha (float) – alpha color value of geometry (for highlight/sopt/hbond mtypes)
- transparent (bool) – whether atoms should be transparent (for highlight/sopt/hbond mtypes)
- hbondwidth (float) – width of lines depicting interaction (for hbond mtypes)
- atom_groups ([list or str, list or str]) – restrict interactions to between two lists (or identifiers) of atom indexes (for sopt/hbond mtypes)
- no_hbonds (bool) – whether to ignore H-Bonds in the calculation
- ipyimg (bool) – whether to return an IPython image, PIL image otherwise
- Yields –
- ------- –
- indx (int) – the row index of the molecule
- mol (IPython.display.Image or PIL.Image) – an image of the molecule in the format specified by ipyimg