`doenut`

DoENUT

Design of Experiments Numerical Utility Toolkit

DoENUT is a set of classes and functions designed to make Design of Experiments easier in Python.

To get started, see the workbooks under Tutorials or look at AveragedModel and ModifiableDataSet.

As a very quick start, assuming your data is split into a pair of pandas DataFrames, one for the input data and one for the responses, the following code will create a standard model and generate some stats on how good it is:

dataset = doenut.data.ModifiableDataSet(inputs, responses)
model = doenut.models.AveragedModel(dataset)
r2, q2 = model.r2, model.q2
print(f"R2 is {r2}, Q2 is {q2}")
doenut.plot.plot_summary_of_fit_small(r2, q2)
doenut.plot.coeff_plot(model.coeffs,
                       labels=list(dataset.get().inputs.columns),
                       errors='p95',
                       normalise=True)

Subpackages

Submodules

Package Contents

Functions

`add_higher_order_terms`(→ Tuple[pandas.DataFrame, List])	Generate a saturated set of inputs by adding the power and interaction terms
`autotune_model`(inputs, responses, source_list[, ...])	Attempts to automatically tune a parsimonious model
`average_replicates`(→ Tuple[pandas.DataFrame, ...)	averages inputs that are the same
`calc_ave_coeffs_and_errors`(coeffs, labels[, errors, ...])	Coefficient plot
`Calculate_Q2`(→ float)	A different way of calculating Q2
`Calculate_R2`(→ float)	Calculates R2 from input data
`dunk`(→ None)	dunk your doenut
`find_replicates`(→ numpy.array)	Find experimental settings that are replicates
`map_chemical_space`(unscaled_model, x_key, y_key, ...)	Calculates a three way map of chemical space for plotting
`orthogonal_scaling`(→ Tuple[pandas.DataFrame, float, float])	Calculates the orthoganal scaling of an array along an axis
`predict_from_model`(model, inputs, input_selector)	Reorgs the inputs and does a prediction
`scale_1D_data`(scaler, data[, do_fit])	ELLA TODO: What does this do what it does?
`scale_by`(→ pandas.DataFrame)	Scales a dataframe orthogonally using the supplied parameters according to
`set_log_level`(→ None)	Sets the global log level for the module
`train_model`(→ Tuple[sklearn.linear_model, ...)	A simple function to train a model

Attributes

__version__

doenut.__version__

doenut.add_higher_order_terms(inputs: pandas.DataFrame, add_squares: bool = True, add_interactions: bool = True, column_list: list = []) → Tuple[pandas.DataFrame, List][source]

Generate a saturated set of inputs by adding the power and interaction terms Currently does not go above power of 2

Parameters:

inputs (pd.DataFrame :) – The data to generate from
add_squares (bool :) – Optional) Whether to add square terms, e.g. x_1*2
add_interactions (bool :) – Optional) Whether to add interaction terms, e.g. x_1*x_2
column_list (list :) – Optional) Which columns to generate from
inputs –
add_squares – (Default value = True)
add_interactions – (Default value = True)
column_list – (Default value = [])

Returns:

Tuple of the saturated inputs, and a list of which inputs created which input column.

Return type:

type

doenut.autotune_model(inputs, responses, source_list, response_selector=[0], use_scaled_inputs=True, do_scaling_here=True, drop_duplicates='average', errors='p95', normalise=True, do_hierarchical=True, remove_significant=False)[source]

Attempts to automatically tune a parsimonious model

TODO:: update to new code and remove redundant parameters

Parameters:

inputs – The input data to train on
responses – The response values for the input data
source_list – param response_selector: (Optional) Which columns in responses to use
use_scaled_inputs – Optional) Whether to scale the inputs before calculations (Default value = True)
do_scaling_here – Optional) Whether to scale each set of train/test data (Default value = True)
drop_duplicates – Optional) Do we ingnore (C{‘no’}), C{‘average’}, C{‘Drop’} duplicate input values (Default value = “average”)
errors – Optional) C{‘p95’} for 95th percentile or C{‘std’} for standard deviation for error calculation (Default value = “p95”)
normalise – Optional) Whether to normalise the coefficents for error calculation (Default value = True)
do_hierarchical – Optional) Do we maintain a hierarchical model? (Default value = True)
remove_significant – Optional) Model will continue removing terms until only one is left (Default value = False)
response_selector – (Default value = [0])

Returns:

A tuple of the terms used in the final model and the final model.

Return type:

type

doenut.average_replicates(inputs: pandas.DataFrame, responses: pandas.DataFrame) → Tuple[pandas.DataFrame, pandas.DataFrame][source]

averages inputs that are the same

Parameters:

inputs (pd.DataFrame :) – The input data to average
responses (pd.DataFrame :) – The responses to averaged
inputs –
responses –

Returns:

A tuple of the averaged inputs and responses

Return type:

type

doenut.calc_ave_coeffs_and_errors(coeffs, labels, errors='std', normalise=False)[source]

Coefficient plot set error to ‘std’ for standard deviation set error to ‘p95’ for 95th percentile ( approximated by 2*std)

Parameters:

coeffs – The coefficents to calculate from
labels – No longer used?
errors – The type of error to calculate, C{std} or C{p95} (Default value = “std”)
normalise – Whether to normalise the data prior to calculation (Default value = False)

Returns:

A tuple of the averaged coefficients and their error bars

Return type:

type

doenut.Calculate_Q2(ground_truth: pandas.DataFrame, predictions: pandas.DataFrame, train_responses: pandas.DataFrame, key: str, word: str = 'test') → float[source]

A different way of calculating Q2 this uses the mean from the training data, not the test ground truth

Parameters:

ground_truth (pd.DataFrame :) – The actual response values of the test set
predictions (pd.DataFrame :) – The predictions of the model for the test set
train_responses (pd.DataFrame :) – The response values of the training set
key (str :) – Which column in the ground_truth we are predicting
word (str :) – The mode to run in
ground_truth –
predictions –
train_responses –
key –
word – (Default value = “test”)

Returns:

The calculated Coefficient (R2/Q1)

Return type:

type

doenut.Calculate_R2(ground_truth: pandas.DataFrame, predictions: pandas.DataFrame, key: str, word: str = 'test') → float[source]

Calculates R2 from input data You can use this to calculate q2 if you’re using the test ground truth as the mean else use calculate Q2 I think this is what Modde uses for PLS fitting

Parameters:

ground_truth (pd.DataFrame :) – The actual response values
predictions (pd.DataFrame :) – What the model guessed as the response values
key (str :) – the column name into ground_truth that we predicted
word (str :) – What mode we were working on
ground_truth –
predictions –
key –
word – (Default value = “test”)

Returns:

the R2 of the model on this data, or the Q2 if in test mode.

Return type:

type

doenut.dunk(setting: str | None = None) → None[source]

dunk your doenut

Parameters:: setting (str, default None) – what you are dunking it into

doenut.find_replicates(inputs: pandas.DataFrame) → numpy.array[source]

Find experimental settings that are replicates

Parameters:: inputs (pd.DataFrame) – The dataframe to parse
Returns:: A series of indices of all the rows which are replicates
Return type:: np.array

doenut.map_chemical_space(unscaled_model, x_key, y_key, c_key, x_limits, y_limits, constant, n_points, hook_function)[source]

Calculates a three way map of chemical space for plotting

#TODO:: Should move this to doenut.plot

Parameters:

unscaled_model – The model to plot
x_key – What key to use for the X axis
y_key – What key to use for the Y axis
c_key – What key to use for the C axis
x_limits – Tuple of min/max range of X to plot
y_limits – Tuple of min/max range of y to plot
constant – The value for C
n_points – How many marks along each axis to generate
hook_function – A custom data processing function for post processing the data

Returns:

Three meshes of the model’s predictions for the keys/ranges predicted.

Return type:

type

doenut.orthogonal_scaling(inputs: pandas.DataFrame, axis: int = 0) → Tuple[pandas.DataFrame, float, float][source]

Calculates the orthoganal scaling of an array along an axis

Parameters:

inputs (pd.DataFrame) – the dataframe to scale
axis (int, default 0) – the axis to scale around (defaults to 0)

Returns:

pd.DataFrame – The scaled inputs
float – the Mj scaling parameter
float – the Rj scaling parameter

doenut.predict_from_model(model, inputs, input_selector)[source]

Reorgs the inputs and does a prediction

Parameters:

model – the model to use
inputs – the saturated inputs
input_selector – the subset of inputs the model is using

Returns:

Tuple of the predictions and the terms used to generate them

Return type:

type

doenut.scale_1D_data(scaler, data, do_fit=True)[source]

ELLA TODO: What does this do what it does?

Parameters:

scaler – the scaler to transform the data with
data – the data to scale
do_fit – whether to fit the data first (default true)

Returns:

pd.DataFrame – The scaled data
sklearn.scalar? – The scaler object

doenut.scale_by(new_data: pandas.DataFrame, mj: float, rj: float) → pandas.DataFrame[source]

Scales a dataframe orthogonally using the supplied parameters according to the equation:

result = (data - Mj) / Rj

Parameters:

new_data (pd.DataFrame) – the data to scale
mj (float) – the Mj parameter
rj (float) – the Rj parameter

Returns:

the scaled data

Return type:

pd.DataFrame

doenut.set_log_level(level: str | int) → None[source]

Sets the global log level for the module

Parameters:: level ("str|int") – logging module value representing the desired log level

doenut.train_model(inputs: pandas.DataFrame, responses: pandas.DataFrame, test_responses: pandas.DataFrame, do_scaling_here: bool = False, fit_intercept: bool = False) → Tuple[sklearn.linear_model, pandas.DataFrame, float, List[Any]][source]

A simple function to train a model

Parameters:

inputs – full set of terms for the model (x_n)
responses – expected responses for the inputs (ground truth, y)
test_responses – expected responses for separate test data (if used)
do_scaling_here – whether to scale the data (Default value = False)
fit_intercept – whether to fit the intercept (Default value = False)

Returns:

sklearn.linear_model – A model fitted to the data,
pd.DataFrame – the inputs used
float – the R2 of that model
List[Any] – the predictions that model makes for the original inputs

doenut

Subpackages

Submodules

Package Contents

Functions

Attributes

`doenut`