doenut.models.averaged_model
Module Contents
Classes
Model scored as the average of multiple models generated from a single |
Attributes
- doenut.models.averaged_model.logger
- class doenut.models.averaged_model.AveragedModel(data: doenut.data.modifiable_data_set.ModifiableDataSet, scale_data: bool = True, scale_run_data: bool = True, fit_intercept: bool = True, response_key: str = None, drop_duplicates: str = 'yes')[source]
Bases:
doenut.models.model.ModelModel scored as the average of multiple models generated from a single set of inputs via a leave-one-out approach.
- Parameters:
data (doenut.data.ModifiableDataSet) – the data to run / test against.
scale_data (bool, default True) – Whether to scale the overall data before running it.
scale_run_data (bool, default True) – Whether to normalise the data for each run
fit_intercept (bool, default True) – Whether to fit the intercept to zero
response_key (str, optional) – for multi-column responses, which one to test on
drop_duplicates ({'yes', 'drop', 'average'}) – whether to drop duplicate values or not. May also be ‘average’ which will cause them to be dropped, but the one left will have its response value(s) set to the average of all the duplicates.
- classmethod tune_model(data: doenut.data.modifiable_data_set.ModifiableDataSet, fit_intercept: bool = True, response_key: str = None, drop_duplicates: str = 'yes') Tuple[AveragedModel, AveragedModel][source]
Generate a pair of models from the same set of data. One using scaled data the other unscaled.
The scaled model can then be used for determining which columns to drop for later models, and the unscaled model for checking the models performance against validation data (or just for using once done).
- Parameters:
data (doenut.data.ModifiableDataSet) – The dataset to test against. This should be unscaled.
fit_intercept (bool, default True) – Whether to fit the intercept or not (usually yes)
response_key (str, optional) – If there are more than one response columns, which to use.
drop_duplicates ({'yes', 'drop', 'average'}) – whether to drop duplicate values or not. May also be ‘average’ which will cause them to be dropped, but the one left will have its response value(s) set to the average of all the duplicates.
- Returns:
AveragedModel – The generated scaled model
AveragedModel – The generated unscaled model