doenut
DoENUT
Design of Experiments Numerical Utility Toolkit
DoENUT is a set of classes and functions designed to make Design of Experiments easier in Python.
To get started, see the workbooks under Tutorials or look at AveragedModel and ModifiableDataSet.
As a very quick start, assuming your data is split into a pair of pandas DataFrames, one for the input data and one for the responses, the following code will create a standard model and generate some stats on how good it is:
dataset = doenut.data.ModifiableDataSet(inputs, responses)
model = doenut.models.AveragedModel(dataset)
r2, q2 = model.r2, model.q2
print(f"R2 is {r2}, Q2 is {q2}")
doenut.plot.plot_summary_of_fit_small(r2, q2)
doenut.plot.coeff_plot(model.coeffs,
labels=list(dataset.get().inputs.columns),
errors='p95',
normalise=True)
Subpackages
Submodules
Package Contents
Functions
|
Generate a saturated set of inputs by adding the power and interaction terms |
|
Attempts to automatically tune a parsimonious model |
|
averages inputs that are the same |
|
Coefficient plot |
|
A different way of calculating Q2 |
|
Calculates R2 from input data |
|
dunk your doenut |
|
Find experimental settings that are replicates |
|
Calculates a three way map of chemical space for plotting |
|
Calculates the orthoganal scaling of an array along an axis |
|
Reorgs the inputs and does a prediction |
|
ELLA TODO: What does this do what it does? |
|
Scales a dataframe orthogonally using the supplied parameters according to |
|
Sets the global log level for the module |
|
A simple function to train a model |
Attributes
- doenut.__version__
- doenut.add_higher_order_terms(inputs: pandas.DataFrame, add_squares: bool = True, add_interactions: bool = True, column_list: list = []) Tuple[pandas.DataFrame, List][source]
Generate a saturated set of inputs by adding the power and interaction terms Currently does not go above power of 2
- Parameters:
inputs (pd.DataFrame :) – The data to generate from
add_squares (bool :) – Optional) Whether to add square terms, e.g. x_1*2
add_interactions (bool :) – Optional) Whether to add interaction terms, e.g. x_1*x_2
column_list (list :) – Optional) Which columns to generate from
inputs –
add_squares – (Default value = True)
add_interactions – (Default value = True)
column_list – (Default value = [])
- Returns:
Tuple of the saturated inputs, and a list of which inputs created which input column.
- Return type:
type
- doenut.autotune_model(inputs, responses, source_list, response_selector=[0], use_scaled_inputs=True, do_scaling_here=True, drop_duplicates='average', errors='p95', normalise=True, do_hierarchical=True, remove_significant=False)[source]
Attempts to automatically tune a parsimonious model
TODO:: update to new code and remove redundant parameters
- Parameters:
inputs – The input data to train on
responses – The response values for the input data
source_list – param response_selector: (Optional) Which columns in responses to use
use_scaled_inputs – Optional) Whether to scale the inputs before calculations (Default value = True)
do_scaling_here – Optional) Whether to scale each set of train/test data (Default value = True)
drop_duplicates – Optional) Do we ingnore (C{‘no’}), C{‘average’}, C{‘Drop’} duplicate input values (Default value = “average”)
errors – Optional) C{‘p95’} for 95th percentile or C{‘std’} for standard deviation for error calculation (Default value = “p95”)
normalise – Optional) Whether to normalise the coefficents for error calculation (Default value = True)
do_hierarchical – Optional) Do we maintain a hierarchical model? (Default value = True)
remove_significant – Optional) Model will continue removing terms until only one is left (Default value = False)
response_selector – (Default value = [0])
- Returns:
A tuple of the terms used in the final model and the final model.
- Return type:
type
- doenut.average_replicates(inputs: pandas.DataFrame, responses: pandas.DataFrame) Tuple[pandas.DataFrame, pandas.DataFrame][source]
averages inputs that are the same
- Parameters:
inputs (pd.DataFrame :) – The input data to average
responses (pd.DataFrame :) – The responses to averaged
inputs –
responses –
- Returns:
A tuple of the averaged inputs and responses
- Return type:
type
- doenut.calc_ave_coeffs_and_errors(coeffs, labels, errors='std', normalise=False)[source]
Coefficient plot set error to ‘std’ for standard deviation set error to ‘p95’ for 95th percentile ( approximated by 2*std)
- Parameters:
coeffs – The coefficents to calculate from
labels – No longer used?
errors – The type of error to calculate, C{std} or C{p95} (Default value = “std”)
normalise – Whether to normalise the data prior to calculation (Default value = False)
- Returns:
A tuple of the averaged coefficients and their error bars
- Return type:
type
- doenut.Calculate_Q2(ground_truth: pandas.DataFrame, predictions: pandas.DataFrame, train_responses: pandas.DataFrame, key: str, word: str = 'test') float[source]
A different way of calculating Q2 this uses the mean from the training data, not the test ground truth
- Parameters:
ground_truth (pd.DataFrame :) – The actual response values of the test set
predictions (pd.DataFrame :) – The predictions of the model for the test set
train_responses (pd.DataFrame :) – The response values of the training set
key (str :) – Which column in the ground_truth we are predicting
word (str :) – The mode to run in
ground_truth –
predictions –
train_responses –
key –
word – (Default value = “test”)
- Returns:
The calculated Coefficient (R2/Q1)
- Return type:
type
- doenut.Calculate_R2(ground_truth: pandas.DataFrame, predictions: pandas.DataFrame, key: str, word: str = 'test') float[source]
Calculates R2 from input data You can use this to calculate q2 if you’re using the test ground truth as the mean else use calculate Q2 I think this is what Modde uses for PLS fitting
- Parameters:
ground_truth (pd.DataFrame :) – The actual response values
predictions (pd.DataFrame :) – What the model guessed as the response values
key (str :) – the column name into ground_truth that we predicted
word (str :) – What mode we were working on
ground_truth –
predictions –
key –
word – (Default value = “test”)
- Returns:
the R2 of the model on this data, or the Q2 if in test mode.
- Return type:
type
- doenut.dunk(setting: str | None = None) None[source]
dunk your doenut
- Parameters:
setting (str, default None) – what you are dunking it into
- doenut.find_replicates(inputs: pandas.DataFrame) numpy.array[source]
Find experimental settings that are replicates
- Parameters:
inputs (pd.DataFrame) – The dataframe to parse
- Returns:
A series of indices of all the rows which are replicates
- Return type:
np.array
- doenut.map_chemical_space(unscaled_model, x_key, y_key, c_key, x_limits, y_limits, constant, n_points, hook_function)[source]
Calculates a three way map of chemical space for plotting
#TODO:: Should move this to doenut.plot
- Parameters:
unscaled_model – The model to plot
x_key – What key to use for the X axis
y_key – What key to use for the Y axis
c_key – What key to use for the C axis
x_limits – Tuple of min/max range of X to plot
y_limits – Tuple of min/max range of y to plot
constant – The value for C
n_points – How many marks along each axis to generate
hook_function – A custom data processing function for post processing the data
- Returns:
Three meshes of the model’s predictions for the keys/ranges predicted.
- Return type:
type
- doenut.orthogonal_scaling(inputs: pandas.DataFrame, axis: int = 0) Tuple[pandas.DataFrame, float, float][source]
Calculates the orthoganal scaling of an array along an axis
- Parameters:
inputs (pd.DataFrame) – the dataframe to scale
axis (int, default 0) – the axis to scale around (defaults to 0)
- Returns:
pd.DataFrame – The scaled inputs
float – the Mj scaling parameter
float – the Rj scaling parameter
- doenut.predict_from_model(model, inputs, input_selector)[source]
Reorgs the inputs and does a prediction
- Parameters:
model – the model to use
inputs – the saturated inputs
input_selector – the subset of inputs the model is using
- Returns:
Tuple of the predictions and the terms used to generate them
- Return type:
type
- doenut.scale_1D_data(scaler, data, do_fit=True)[source]
ELLA TODO: What does this do what it does?
- Parameters:
scaler – the scaler to transform the data with
data – the data to scale
do_fit – whether to fit the data first (default true)
- Returns:
pd.DataFrame – The scaled data
sklearn.scalar? – The scaler object
- doenut.scale_by(new_data: pandas.DataFrame, mj: float, rj: float) pandas.DataFrame[source]
Scales a dataframe orthogonally using the supplied parameters according to the equation:
result = (data - Mj) / Rj
- Parameters:
new_data (pd.DataFrame) – the data to scale
mj (float) – the Mj parameter
rj (float) – the Rj parameter
- Returns:
the scaled data
- Return type:
pd.DataFrame
- doenut.set_log_level(level: str | int) None[source]
Sets the global log level for the module
- Parameters:
level ("str|int") – logging module value representing the desired log level
- doenut.train_model(inputs: pandas.DataFrame, responses: pandas.DataFrame, test_responses: pandas.DataFrame, do_scaling_here: bool = False, fit_intercept: bool = False) Tuple[sklearn.linear_model, pandas.DataFrame, float, List[Any]][source]
A simple function to train a model
- Parameters:
inputs – full set of terms for the model (x_n)
responses – expected responses for the inputs (ground truth, y)
test_responses – expected responses for separate test data (if used)
do_scaling_here – whether to scale the data (Default value = False)
fit_intercept – whether to fit the intercept (Default value = False)
- Returns:
sklearn.linear_model – A model fitted to the data,
pd.DataFrame – the inputs used
float – the R2 of that model
List[Any] – the predictions that model makes for the original inputs