doenut.data.modifiers

DoENUT: data.modifiers These classes provide ways to manipulate dataset (filtering, scaling, etc).

Submodules

Package Contents

Classes

OrthoScaler

Takes a dataset and scales it per column using an ortho scaling to

ColumnSelector

DataSet Modifier to remove columns from the dataset

DuplicateRemover

Parses a dataset and removes all but the first instance of any row that

DuplicateAverager

Parses a dataset and removes all but the first instance of any row that

class doenut.data.modifiers.OrthoScaler(inputs: pandas.DataFrame, responses: pandas.DataFrame, scale_responses: bool = False)[source]

Bases: doenut.data.modifiers.data_set_modifier.DataSetModifier

Takes a dataset and scales it per column using an ortho scaling to the range -1 … 1

Parameters:
  • inputs (pd.DataFrame) – The dataset’s inputs

  • responses (pd.DataFrame) – The dataset’s responses

  • scale_responses (bool, default False) – Whether to also scale the responses.

classmethod _compute_scaling(data: pandas.DataFrame) Tuple[float, float][source]
apply_to_inputs(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the inputs of the dataset.

Parameters:

data (pd.DataFrame) – The input data

Returns:

The modified input data

Return type:

pd.DataFrame

apply_to_responses(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the responses of the dataset.

Parameters:

data (pd.DataFrame) – The response data

Returns:

The modified response data

Return type:

pd.DataFrame

class doenut.data.modifiers.ColumnSelector(inputs: pandas.DataFrame, responses: pandas.DataFrame, input_selector: List[str | int] = None, response_selector: List[str | int] = None)[source]

Bases: doenut.data.modifiers.data_set_modifier.DataSetModifier

DataSet Modifier to remove columns from the dataset

Parameters:
  • inputs (pd.DataFrame) – The dataset’s inputs

  • responses (pd.DataFrame) – The dataset’s responses

  • input_selector (List["str | int"], optional) – A list to filter the inputs by

  • response_selector (List["str | int"], optional) – A list to filter the responses by

Warning

At least one of input_selector and response_selector must be specified.

classmethod _parse_selector(data: pandas.DataFrame, selector: List[str | int]) Tuple[List[str], List[int]][source]

Internal helper function to take either a list of column names or column indices and convert it to the other.

Parameters:
  • data (pd.DataFrame) – The data set the list applies to

  • selector (List["str | int"]) – The known selector list

Returns:

  • List[str] – The list of column names selected

  • List[int] – The list of column indices selected

apply_to_inputs(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the inputs of the dataset.

Parameters:

data (pd.DataFrame) – The input data

Returns:

The modified input data

Return type:

pd.DataFrame

apply_to_responses(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the responses of the dataset.

Parameters:

data (pd.DataFrame) – The response data

Returns:

The modified response data

Return type:

pd.DataFrame

class doenut.data.modifiers.DuplicateRemover(inputs: pandas.DataFrame, responses: pandas.DataFrame)[source]

Bases: doenut.data.modifiers.data_set_modifier.DataSetModifier

Parses a dataset and removes all but the first instance of any row that has duplicate values for the inputs. Will also remove the corresponding row in the responses.

Parameters:
  • inputs (pd.DataFrame) – The dataset’s inputs

  • responses (pd.DataFrame) – The dataset’s responses

classmethod _get_duplicate_rows(data: pandas.DataFrame) Dict[int, Set[int]][source]
classmethod _get_non_duplicate_rows(data: pandas.DataFrame, duplicates_dict: Dict[int, Iterable[int]] = None) List[int][source]
apply_to_inputs(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the inputs of the dataset.

Parameters:

data (pd.DataFrame) – The input data

Returns:

The modified input data

Return type:

pd.DataFrame

apply_to_responses(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the responses of the dataset.

Parameters:

data (pd.DataFrame) – The response data

Returns:

The modified response data

Return type:

pd.DataFrame

class doenut.data.modifiers.DuplicateAverager(inputs: pandas.DataFrame, responses: pandas.DataFrame)[source]

Bases: doenut.data.modifiers.duplicate_remover.DuplicateRemover

Parses a dataset and removes all but the first instance of any row that has duplicate values for the inputs. Will also remove the corresponding row in the responses, replacing the remaining response with the averages of the duplicates’ values.

Parameters:
  • inputs (pd.DataFrame) – The dataset’s inputs

  • responses (pd.DataFrame) – The dataset’s responses

classmethod _apply(data: pandas.DataFrame, duplicate_dict: Dict[int, Iterable[int]], non_duplicate_rows: List[int]) pandas.DataFrame[source]
apply_to_inputs(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the inputs of the dataset.

Parameters:

data (pd.DataFrame) – The input data

Returns:

The modified input data

Return type:

pd.DataFrame

apply_to_responses(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the responses of the dataset.

Parameters:

data (pd.DataFrame) – The response data

Returns:

The modified response data

Return type:

pd.DataFrame