doenut.data.modifiers.duplicate_remover

Module Contents

Classes

DuplicateRemover

Parses a dataset and removes all but the first instance of any row that

class doenut.data.modifiers.duplicate_remover.DuplicateRemover(inputs: pandas.DataFrame, responses: pandas.DataFrame)[source]

Bases: doenut.data.modifiers.data_set_modifier.DataSetModifier

Parses a dataset and removes all but the first instance of any row that has duplicate values for the inputs. Will also remove the corresponding row in the responses.

Parameters:
  • inputs (pd.DataFrame) – The dataset’s inputs

  • responses (pd.DataFrame) – The dataset’s responses

classmethod _get_duplicate_rows(data: pandas.DataFrame) Dict[int, Set[int]][source]
classmethod _get_non_duplicate_rows(data: pandas.DataFrame, duplicates_dict: Dict[int, Iterable[int]] = None) List[int][source]
apply_to_inputs(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the inputs of the dataset.

Parameters:

data (pd.DataFrame) – The input data

Returns:

The modified input data

Return type:

pd.DataFrame

apply_to_responses(data: pandas.DataFrame) pandas.DataFrame[source]

Applies the modifier to the responses of the dataset.

Parameters:

data (pd.DataFrame) – The response data

Returns:

The modified response data

Return type:

pd.DataFrame