doenut

DoENUT

Design of Experiments Numerical Utility Toolkit

DoENUT is a set of classes and functions designed to make Design of Experiments easier in Python.

To get started, see the workbooks under Tutorials or look at AveragedModel and ModifiableDataSet.

As a very quick start, assuming your data is split into a pair of pandas DataFrames, one for the input data and one for the responses, the following code will create a standard model and generate some stats on how good it is:

dataset = doenut.data.ModifiableDataSet(inputs, responses)
model = doenut.models.AveragedModel(dataset)
r2, q2 = model.r2, model.q2
print(f"R2 is {r2}, Q2 is {q2}")
doenut.plot.plot_summary_of_fit_small(r2, q2)
doenut.plot.coeff_plot(model.coeffs,
                       labels=list(dataset.get().inputs.columns),
                       errors='p95',
                       normalise=True)

Subpackages

Submodules

Package Contents

Functions

add_higher_order_terms(→ Tuple[pandas.DataFrame, List])

Generate a saturated set of inputs by adding the power and interaction terms

autotune_model(inputs, responses, source_list[, ...])

Attempts to automatically tune a parsimonious model

average_replicates(→ Tuple[pandas.DataFrame, ...)

averages inputs that are the same

calc_ave_coeffs_and_errors(coeffs, labels[, errors, ...])

Coefficient plot

Calculate_Q2(→ float)

A different way of calculating Q2

Calculate_R2(→ float)

Calculates R2 from input data

dunk(→ None)

dunk your doenut

find_replicates(→ numpy.array)

Find experimental settings that are replicates

map_chemical_space(unscaled_model, x_key, y_key, ...)

Calculates a three way map of chemical space for plotting

orthogonal_scaling(→ Tuple[pandas.DataFrame, float, float])

Calculates the orthoganal scaling of an array along an axis

predict_from_model(model, inputs, input_selector)

Reorgs the inputs and does a prediction

scale_1D_data(scaler, data[, do_fit])

ELLA TODO: What does this do what it does?

scale_by(→ pandas.DataFrame)

Scales a dataframe orthogonally using the supplied parameters according to

set_log_level(→ None)

Sets the global log level for the module

train_model(→ Tuple[sklearn.linear_model, ...)

A simple function to train a model

Attributes

__version__

doenut.__version__
doenut.add_higher_order_terms(inputs: pandas.DataFrame, add_squares: bool = True, add_interactions: bool = True, column_list: list = []) Tuple[pandas.DataFrame, List][source]

Generate a saturated set of inputs by adding the power and interaction terms Currently does not go above power of 2

Parameters:
  • inputs (pd.DataFrame :) – The data to generate from

  • add_squares (bool :) – Optional) Whether to add square terms, e.g. x_1*2

  • add_interactions (bool :) – Optional) Whether to add interaction terms, e.g. x_1*x_2

  • column_list (list :) – Optional) Which columns to generate from

  • inputs

  • add_squares – (Default value = True)

  • add_interactions – (Default value = True)

  • column_list – (Default value = [])

Returns:

Tuple of the saturated inputs, and a list of which inputs created which input column.

Return type:

type

doenut.autotune_model(inputs, responses, source_list, response_selector=[0], use_scaled_inputs=True, do_scaling_here=True, drop_duplicates='average', errors='p95', normalise=True, do_hierarchical=True, remove_significant=False)[source]

Attempts to automatically tune a parsimonious model

TODO:: update to new code and remove redundant parameters

Parameters:
  • inputs – The input data to train on

  • responses – The response values for the input data

  • source_list – param response_selector: (Optional) Which columns in responses to use

  • use_scaled_inputs – Optional) Whether to scale the inputs before calculations (Default value = True)

  • do_scaling_here – Optional) Whether to scale each set of train/test data (Default value = True)

  • drop_duplicates – Optional) Do we ingnore (C{‘no’}), C{‘average’}, C{‘Drop’} duplicate input values (Default value = “average”)

  • errors – Optional) C{‘p95’} for 95th percentile or C{‘std’} for standard deviation for error calculation (Default value = “p95”)

  • normalise – Optional) Whether to normalise the coefficents for error calculation (Default value = True)

  • do_hierarchical – Optional) Do we maintain a hierarchical model? (Default value = True)

  • remove_significant – Optional) Model will continue removing terms until only one is left (Default value = False)

  • response_selector – (Default value = [0])

Returns:

A tuple of the terms used in the final model and the final model.

Return type:

type

doenut.average_replicates(inputs: pandas.DataFrame, responses: pandas.DataFrame) Tuple[pandas.DataFrame, pandas.DataFrame][source]

averages inputs that are the same

Parameters:
  • inputs (pd.DataFrame :) – The input data to average

  • responses (pd.DataFrame :) – The responses to averaged

  • inputs

  • responses

Returns:

A tuple of the averaged inputs and responses

Return type:

type

doenut.calc_ave_coeffs_and_errors(coeffs, labels, errors='std', normalise=False)[source]

Coefficient plot set error to ‘std’ for standard deviation set error to ‘p95’ for 95th percentile ( approximated by 2*std)

Parameters:
  • coeffs – The coefficents to calculate from

  • labels – No longer used?

  • errors – The type of error to calculate, C{std} or C{p95} (Default value = “std”)

  • normalise – Whether to normalise the data prior to calculation (Default value = False)

Returns:

A tuple of the averaged coefficients and their error bars

Return type:

type

doenut.Calculate_Q2(ground_truth: pandas.DataFrame, predictions: pandas.DataFrame, train_responses: pandas.DataFrame, key: str, word: str = 'test') float[source]

A different way of calculating Q2 this uses the mean from the training data, not the test ground truth

Parameters:
  • ground_truth (pd.DataFrame :) – The actual response values of the test set

  • predictions (pd.DataFrame :) – The predictions of the model for the test set

  • train_responses (pd.DataFrame :) – The response values of the training set

  • key (str :) – Which column in the ground_truth we are predicting

  • word (str :) – The mode to run in

  • ground_truth

  • predictions

  • train_responses

  • key

  • word – (Default value = “test”)

Returns:

The calculated Coefficient (R2/Q1)

Return type:

type

doenut.Calculate_R2(ground_truth: pandas.DataFrame, predictions: pandas.DataFrame, key: str, word: str = 'test') float[source]

Calculates R2 from input data You can use this to calculate q2 if you’re using the test ground truth as the mean else use calculate Q2 I think this is what Modde uses for PLS fitting

Parameters:
  • ground_truth (pd.DataFrame :) – The actual response values

  • predictions (pd.DataFrame :) – What the model guessed as the response values

  • key (str :) – the column name into ground_truth that we predicted

  • word (str :) – What mode we were working on

  • ground_truth

  • predictions

  • key

  • word – (Default value = “test”)

Returns:

the R2 of the model on this data, or the Q2 if in test mode.

Return type:

type

doenut.dunk(setting: str | None = None) None[source]

dunk your doenut

Parameters:

setting (str, default None) – what you are dunking it into

doenut.find_replicates(inputs: pandas.DataFrame) numpy.array[source]

Find experimental settings that are replicates

Parameters:

inputs (pd.DataFrame) – The dataframe to parse

Returns:

A series of indices of all the rows which are replicates

Return type:

np.array

doenut.map_chemical_space(unscaled_model, x_key, y_key, c_key, x_limits, y_limits, constant, n_points, hook_function)[source]

Calculates a three way map of chemical space for plotting

#TODO:: Should move this to doenut.plot

Parameters:
  • unscaled_model – The model to plot

  • x_key – What key to use for the X axis

  • y_key – What key to use for the Y axis

  • c_key – What key to use for the C axis

  • x_limits – Tuple of min/max range of X to plot

  • y_limits – Tuple of min/max range of y to plot

  • constant – The value for C

  • n_points – How many marks along each axis to generate

  • hook_function – A custom data processing function for post processing the data

Returns:

Three meshes of the model’s predictions for the keys/ranges predicted.

Return type:

type

doenut.orthogonal_scaling(inputs: pandas.DataFrame, axis: int = 0) Tuple[pandas.DataFrame, float, float][source]

Calculates the orthoganal scaling of an array along an axis

Parameters:
  • inputs (pd.DataFrame) – the dataframe to scale

  • axis (int, default 0) – the axis to scale around (defaults to 0)

Returns:

  • pd.DataFrame – The scaled inputs

  • float – the Mj scaling parameter

  • float – the Rj scaling parameter

doenut.predict_from_model(model, inputs, input_selector)[source]

Reorgs the inputs and does a prediction

Parameters:
  • model – the model to use

  • inputs – the saturated inputs

  • input_selector – the subset of inputs the model is using

Returns:

Tuple of the predictions and the terms used to generate them

Return type:

type

doenut.scale_1D_data(scaler, data, do_fit=True)[source]

ELLA TODO: What does this do what it does?

Parameters:
  • scaler – the scaler to transform the data with

  • data – the data to scale

  • do_fit – whether to fit the data first (default true)

Returns:

  • pd.DataFrame – The scaled data

  • sklearn.scalar? – The scaler object

doenut.scale_by(new_data: pandas.DataFrame, mj: float, rj: float) pandas.DataFrame[source]

Scales a dataframe orthogonally using the supplied parameters according to the equation:

result = (data - Mj) / Rj
Parameters:
  • new_data (pd.DataFrame) – the data to scale

  • mj (float) – the Mj parameter

  • rj (float) – the Rj parameter

Returns:

the scaled data

Return type:

pd.DataFrame

doenut.set_log_level(level: str | int) None[source]

Sets the global log level for the module

Parameters:

level ("str|int") – logging module value representing the desired log level

doenut.train_model(inputs: pandas.DataFrame, responses: pandas.DataFrame, test_responses: pandas.DataFrame, do_scaling_here: bool = False, fit_intercept: bool = False) Tuple[sklearn.linear_model, pandas.DataFrame, float, List[Any]][source]

A simple function to train a model

Parameters:
  • inputs – full set of terms for the model (x_n)

  • responses – expected responses for the inputs (ground truth, y)

  • test_responses – expected responses for separate test data (if used)

  • do_scaling_here – whether to scale the data (Default value = False)

  • fit_intercept – whether to fit the intercept (Default value = False)

Returns:

  • sklearn.linear_model – A model fitted to the data,

  • pd.DataFrame – the inputs used

  • float – the R2 of that model

  • List[Any] – the predictions that model makes for the original inputs