doenut.models.averaged_model

Module Contents

Classes

AveragedModel

Model scored as the average of multiple models generated from a single

Attributes

logger

doenut.models.averaged_model.logger
class doenut.models.averaged_model.AveragedModel(data: doenut.data.modifiable_data_set.ModifiableDataSet, scale_data: bool = True, scale_run_data: bool = True, fit_intercept: bool = True, response_key: str = None, drop_duplicates: str = 'yes')[source]

Bases: doenut.models.model.Model

Model scored as the average of multiple models generated from a single set of inputs via a leave-one-out approach.

Parameters:
  • data (doenut.data.ModifiableDataSet) – the data to run / test against.

  • scale_data (bool, default True) – Whether to scale the overall data before running it.

  • scale_run_data (bool, default True) – Whether to normalise the data for each run

  • fit_intercept (bool, default True) – Whether to fit the intercept to zero

  • response_key (str, optional) – for multi-column responses, which one to test on

  • drop_duplicates ({'yes', 'drop', 'average'}) – whether to drop duplicate values or not. May also be ‘average’ which will cause them to be dropped, but the one left will have its response value(s) set to the average of all the duplicates.

classmethod tune_model(data: doenut.data.modifiable_data_set.ModifiableDataSet, fit_intercept: bool = True, response_key: str = None, drop_duplicates: str = 'yes') Tuple[AveragedModel, AveragedModel][source]

Generate a pair of models from the same set of data. One using scaled data the other unscaled.

The scaled model can then be used for determining which columns to drop for later models, and the unscaled model for checking the models performance against validation data (or just for using once done).

Parameters:
  • data (doenut.data.ModifiableDataSet) – The dataset to test against. This should be unscaled.

  • fit_intercept (bool, default True) – Whether to fit the intercept or not (usually yes)

  • response_key (str, optional) – If there are more than one response columns, which to use.

  • drop_duplicates ({'yes', 'drop', 'average'}) – whether to drop duplicate values or not. May also be ‘average’ which will cause them to be dropped, but the one left will have its response value(s) set to the average of all the duplicates.

Returns:

  • AveragedModel – The generated scaled model

  • AveragedModel – The generated unscaled model