doenut.data

DoENUT: data These classes handle the grouping and storage of DataSets for DoE.

A DataSet consists of a set of inputs and a set of responses. This module provides two classes - a basic DataSet and ModifiableDataSet.

Mostly, you want to use ModifiableDataSet as that allows for the operations you will need for DoE.

Subpackages

Submodules

Package Contents

Classes

ModifiableDataSet

A Dataset that can be modified

DataSet

A dataset that has had all it's modifiers applied.

class doenut.data.ModifiableDataSet(inputs: pandas.DataFrame, responses: pandas.DataFrame)[source]

A Dataset that can be modified

Typically when doing DoE you will want to apply various modifiers such as scaling or filtering of columns to your dataset. ModifiableDataSet is DoENUT’s mechanism to provide this.

A base dataset consists of two pandas dataframes, one for the inputs and one for the responses. These should be of the same length.

Once you have built the dataset you can add modifiers to it using L{add_modifier} or (more likely) via the helper functions such as filter and scale. Finally, L{get()} will then give you a DataSet object containing the result of applying all the modifiers.

Be aware that modifiers are applied in the order they are added, and that modifiers cannot be removed once added. ModifiableDataSet makes deep copies of the dataframes, so the original data will not get changed.

All the modifier functions return a link to self, so they can be used as per the builder pattern - i.e. so you can write code like:

C{dataset = ModifiableDataset(inputs,responses).filter(list).scale()}

get() doenut.data.data_set.DataSet[source]
add_modifier(modifier: Type[doenut.data.modifiers.data_set_modifier.DataSetModifier], **kwargs) ModifiableDataSet[source]

Adds a new modifier to the stack.

Parameters:
  • modifier (Type["DataSetModifier"] :) – The new modifier to add

  • kwargs – Any additional arguments the modifier is expecting.

  • modifier

  • **kwargs

filter(input_selector: List[str | int] = None, response_selector: List[str | int] = None) ModifiableDataSet[source]

Select a subset of the columns in this dataset. You must specify at least one selector. Each select selector can be either a list of column names or indices that you wish to keep.

Parameters:
  • input_selector (List["str | int"] :) – Filter for the input data

  • response_selector (List["str | int"] :) – Filter for the response data

  • input_selector – (Default value = None)

  • response_selector – (Default value = None)

Returns:

this dataset

Return type:

type

scale(scale_responses: bool = False) ModifiableDataSet[source]

Apply an orthographic scaling to the dataset i.e. apply a linear scaling so each column is in the range -1…1

Parameters:
  • scale_responses (bool :) – Whether to scale the response data as well

  • scale_responses – (Default value = False)

Returns:

this dataset

Return type:

type

drop_duplicates() ModifiableDataSet[source]

Removes all duplicate rows from the dataset. The first instance of each duplicate will be kept. NOTE: while only the inputs are considered for whether a row is a duplicate or now, duplicates will be removed from both inputs and responses.

Returns:

self

Return type:

type

average_duplicates() ModifiableDataSet[source]

Removes all duplicate rows from the dataset. The first instance of each duplicate will be kept, and it’s responses set to the average of all the rows that matched it. NOTE: only inputs values are considered for whether a row is a duplicate or not

Returns:

self

Return type:

type

class doenut.data.DataSet(inputs: pandas.DataFrame, responses: pandas.DataFrame)[source]

A dataset that has had all it’s modifiers applied.

Parameters:
  • inputs (pd.DataFrame) – The dataset’s inputs

  • responses (pd.DataFrame) – The dataset’s responses

get_inputs() pandas.DataFrame[source]
get_responses() pandas.DataFrame[source]