pygauss.analysis module¶

class pygauss.analysis.Analysis(folderpath='', server=None, username=None, passwrd=None, folder_obj=None, headers=[])[source]¶

Bases: object

a class to analyse multiple computations

Parameters:	folderpath (str) – the folder directory storing the files to be analysed server (str) – the name of the server storing the files to be analysed username (str) – the username to connect to the server passwrd (str) – server password, if not present it will be asked for during initialisation headers (list) – the variable categories for each computation

add_basic_properties(props=['basis', 'nbasis', 'optimised', 'conformer'])[source]¶: adds columns giving info of basic run properties

add_mol_property(name, method, *args, **kwargs)[source]¶

compute molecule property for all rows and create a data column

Parameters:	name (str) – what to name the data column method (str) – what molecule method to call args – arguments to pass to the molecule method *kwargs – keyword arguments to pass to the molecule method

add_mol_property_subset(name, method, rows=[], filters={}, args=[], kwargs={}, relative_to_rows=[])[source]¶

compute molecule property for a subset of rows and create/add-to data column

Parameters:

name (str or list of strings) – name for output column (multiple if method outputs more than one value)
method (str) – what molecule method to call
rows (list) – what molecule rows to calculate the property for
filters (dict) – filter for selecting molecules to calculate the property for
args (list) – the arguments to pass to the molecule method
kwargs (dict) – the keyword arguments to pass to the molecule method
relative_to_rows (list of ints) – compute values relative to the summated value(s) of molecule at the rows listed

add_run(identifiers={}, init_fname=None, opt_fname=None, freq_fname=None, nbo_fname=None, alignto=[], atom_groups={}, add_if_error=False, folder_obj=None)[source]¶: add single Gaussian run input/outputs

add_runs(headers=[], values=[], init_pattern=None, opt_pattern=None, freq_pattern=None, nbo_pattern=None, add_if_error=False, alignto=[], atom_groups={}, ipython_print=False, folder_obj=None)[source]¶: add multiple Gaussian run inputs/outputs

calc_kmean_groups(category_column, category_name, groups, columns=[], rows=[], filters={})[source]¶

calculate the kmeans grouping of rows

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.

copy()[source]¶

count_runs()[source]¶: get number of runs held in analysis

folder¶: The folder for gaussian runs

get_basic_property(prop, *args, **kwargs)[source]¶

returns a series of a basic run property or nan if it is not available

Parameters:	prop (str) – can be ‘basis’, ‘nbasis’, ‘optimised’, ‘opt_error’ or ‘conformer’

get_folder()[source]¶

get_ids(variable_names, variable_lists)[source]¶: return ids of a list of unique computations

get_molecule(row)[source]¶: get molecule object coresponding to particular row

get_table(rows=[], columns=[], filters={}, precision=4, head=False, mol=False, row_index=[], column_index=[], as_image=False, na_rep='-', font_size=None, width=None, height=None, unconfined=False)[source]¶

return pandas table of requested data in requested format

rows : integer or list of integers: select row ids
columns : string/integer or list of strings/integers: select column names/positions
filters : dict: filter for rows with certain value(s) in specific columns
precision : int: decimal precision of displayed values
head : int: return only first n rows
mol : bool: include column containing the molecule objects
row_index : string or list of strings: columns to use as new index
column_index : list of strings: srings to place in to higher order column indexs
as_image : bool: output the table as an image (used pygauss.utils.df_to_img)
na_rep : str: how to represent empty (nan) cells (if outputting image)
width, height, unconfined : int, int, bool: args for IPy Image

Returns:	df – a table of data
Return type:	pandas.DataFrame

plot_mol_graphs(gtype='energy', share_plot=False, max_cols=1, padding=(1, 1), tick_rotation=0, rows=[], filters={}, sort_columns=[], info_columns=[], info_incl_id=False, letter_prefix='', start_letter='A', grid=True, sharex=True, sharey=True, legend_size=10, color_scheme='jet', eunits='eV', per_energy=1.0, lbound=None, ubound=None, color_homo='g', color_lumo='r', homo_lumo_lines=True, homo_lumo_values=True, band_gap_value=True)[source]¶

get a set of data plots for each molecule

Parameters:

gtype (str) – the type of plot, energy = optimisation energies, freq = frequency analsis, dos = Densty of States,
share_plot (bool) – whether to plot all data on the same or separate axes
max_cols (int) – maximum columns on plots (share_plot=False only)
padding (tuple) – padding between images (horizontally, vertically)
tick_rotation (int) – rotation of x-axis labels
rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
filters (dict) – {columns:values} to filter by
sort_columns (list of str) – columns to sort by
info_columns (list of str) – columns to use as info in caption
info_incl_id (bool) – include molecule id number in labels
letter_prefix (str) – prefix for labelling subplots (share_plot=False only)
start_letter (str) – starting (capital) letter for labelling subplots (share_plot=False only)
grid (bool) – whether to include a grid in the axes
sharex (bool) – whether to align x-axes (share_plot=False only)
sharey (bool) – whether to align y-axes (share_plot=False only)
legend_size (int) – the font size (in pts) for the legend
color_scheme (str) – the scheme to use for each molecule (share_plot=True only) according to http://matplotlib.org/examples/color/colormaps_reference.html
eunits (str) – the units of energy to use
per_energy (float) – energy interval to group states by (DoS only)
lbound (float) – lower bound energy (DoS only)
ubound (float) – upper bound energy (DoS only)
color_homo (matplotlib.colors) – color of homo in matplotlib format
color_lumo (matplotlib.colors) – color of lumo in matplotlib.colors
homo_lumo_lines (bool) – draw lines at HOMO and LUMO energies
homo_lumo_values (bool) – annotate HOMO and LUMO lines with exact energy values
band_gap_value (bool) – annotate inbetween HOMO and LUMO lines with band gap value

Returns:

data (matplotlib.figure.Figure) – plotted frequency data
caption (str) – A caption describing each subplot, given info_columns

plot_mol_images(mtype='optimised', max_cols=1, padding=(1, 1), sort_columns=[], info_columns=[], info_incl_id=False, label_size=20, letter_prefix='', start_letter='A', rows=[], filters={}, align_to=[], rotations=[[0.0, 0.0, 0.0]], gbonds=True, represent='ball_stick', zoom=1.0, width=500, height=500, axis_length=0, relative=False, minval=-1, maxval=1, highlight=[], frame_on=False, eunits='kJmol-1', sopt_min_energy=20.0, sopt_cutoff_energy=0.0, atom_groups=[], alpha=0.5, transparent=False, hbondwidth=5, no_hbonds=False)[source]¶

show molecules in matplotlib table of axes

Parameters:

mtype – ‘initial’, ‘optimised’, ‘nbo’, ‘highlight’, ‘highlight-initial’, ‘sopt’ or ‘hbond’
max_cols (int) – maximum columns in plot
padding (tuple) – padding between images (horizontally, vertically)
sort_columns (list of str) – columns to sort by
info_columns (list of str) – columns to use as info in caption
info_incl_id (bool) – include molecule id number in caption
label_size (int) – subplot label size (pts)
letter_prefix (str) – prefix for labelling subplots
start_letter (str) – starting (capital) letter for labelling subplots
rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
filters (dict) – {columns:values} to filter by
align_to ([int, int, int]) – align geometries to the plane containing these atoms
rotations (list of [float, float, float]) – for each rotation set [x,y,z] an image will be produced
gbonds (bool) – guess bonds between atoms (via distance)
represent (str) – representation of molecule (‘none’, ‘wire’, ‘vdw’ or ‘ball_stick’)
zoom (float) – zoom level of images
width (int) – width of original images
height (int) – height of original images (although width takes precedent)
axis_length (float) – length of x,y,z axes in negative and positive directions
relative (bool) – coloring of nbo atoms scaled to min/max values in atom set (for nbo mtype)
minval (float) – coloring of nbo atoms scaled to absolute min (for nbo mtype)
maxval (float) – coloring of nbo atoms scaled to absolute max (for nbo mtype)
highlight (list of lists) – atom indxes to highlight (for highlight mtype)
eunits (str) – the units of energy to return (for sopt/hbond mtype)
sopt_min_energy (float) – minimum energy to show (for sopt/hbond mtype)
sopt_cutoff_energy (float) – energy below which bonds will be dashed (for sopt mtype)
alpha (float) – alpha color value of geometry (for sopt/hbond mtypes)
transparent (bool) – whether atoms should be transparent (for sopt/hbond mtypes)
hbondwidth (float) – width of lines depicting interaction (for hbond mtypes)
atom_groups ([list or str, list or str]) – restrict interactions to between two lists (or identifiers) of atom indexes (for sopt/hbond mtypes)
no_hbonds (bool) – whether to ignore H-Bonds in the calculation (for sopt only)
frame_on (bool) – whether to show frame around each image

Returns:

fig (matplotlib.figure.Figure) – A figure containing subplots for each molecule image
caption (str) – A caption describing each subplot, given info_columns

plot_radviz_comparison(category_column, columns=[], rows=[], filters={}, point_size=30, **kwargs)[source]¶

return plot axis of radviz graph

RadViz is a way of visualizing multi-variate data. It is based on a simple spring tension minimization algorithm. Basically you set up a bunch of points in a plane. In our case they are equally spaced on a unit circle. Each point represents a single attribute. You then pretend that each sample in the data set is attached to each of these points by a spring, the stiffness of which is proportional to the numerical value of that attribute (they are normalized to unit interval). The point in the plane, where our sample settles to (where the forces acting on our sample are at an equilibrium) is where a dot representing our sample will be drawn. Depending on which class that sample belongs it will be colored differently.

remove_columns(columns)[source]¶

remove_non_conformers(cutoff=0.0)[source]¶: removes runs with negative frequencies

remove_non_optimised()[source]¶: removes runs that were not optimised

remove_rows(rows)[source]¶

remove one or more rows of molecules

Parameters:	rows (int or list of ints:) – the rows to remove

set_folder(folderpath='', server=None, username=None, passwrd=None)[source]¶

yield_mol_images(rows=[], filters={}, mtype='optimised', sort_columns=[], align_to=[], rotations=[[0.0, 0.0, 0.0]], gbonds=True, represent='ball_stick', zoom=1.0, width=300, height=300, axis_length=0, relative=False, minval=-1, maxval=1, highlight=[], active=False, sopt_min_energy=20.0, sopt_cutoff_energy=0.0, atom_groups=[], alpha=0.5, transparent=False, hbondwidth=5, eunits='kJmol-1', no_hbonds=False, ipyimg=True)[source]¶

yields molecules

Parameters:

mtype – ‘initial’, ‘optimised’, ‘nbo’, ‘highlight’, ‘highlight-initial’, ‘sopt’ or ‘hbond’
info_columns (list of str) – columns to use as info in caption
max_cols (int) – maximum columns in plot
label_size (int) – subplot label size (pts)
start_letter (str) – starting (capital) letter for labelling subplots
save_fname (str) – name of file, if you wish to save the plot to file
rows (int or list) – index for the row of each molecule to plot (all plotted if empty)
filters (dict) – {columns:values} to filter by
sort_columns (list of str) – columns to sort by
align_to ([int, int, int]) – align geometries to the plane containing these atoms
rotations (list of [float, float, float]) – for each rotation set [x,y,z] an image will be produced
gbonds (bool) – guess bonds between atoms (via distance)
represent (str) – representation of molecule (‘none’, ‘wire’, ‘vdw’ or ‘ball_stick’)
zoom (float) – zoom level of images
width (int) – width of original images
height (int) – height of original images (although width takes precedent)
axis_length (float) – length of x,y,z axes in negative and positive directions
relative (bool) – coloring of nbo atoms scaled to min/max values in atom set (for nbo mtype)
minval (float) – coloring of nbo atoms scaled to absolute min (for nbo mtype)
maxval (float) – coloring of nbo atoms scaled to absolute max (for nbo mtype)
highlight (list of lists) – atom indxes to highlight (for highlight mtype)
eunits (str) – the units of energy to return (for sopt/hbond mtype)
sopt_min_energy (float) – minimum energy to show (for sopt/hbond mtype)
sopt_cutoff_energy (float) – energy below which bonds will be dashed (for sopt mtype)
alpha (float) – alpha color value of geometry (for highlight/sopt/hbond mtypes)
transparent (bool) – whether atoms should be transparent (for highlight/sopt/hbond mtypes)
hbondwidth (float) – width of lines depicting interaction (for hbond mtypes)
atom_groups ([list or str, list or str]) – restrict interactions to between two lists (or identifiers) of atom indexes (for sopt/hbond mtypes)
no_hbonds (bool) – whether to ignore H-Bonds in the calculation
ipyimg (bool) – whether to return an IPython image, PIL image otherwise
Yields –
------- –
indx (int) – the row index of the molecule
mol (IPython.display.Image or PIL.Image) – an image of the molecule in the format specified by ipyimg

pygauss.analysis.unpack_and_make_molecule(val_dict)[source]¶