etna.auto.Auto#

class Auto(target_metric: Metric, horizon: int, metric_aggregation: Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95'] = 'mean', backtest_params: dict | None = None, experiment_folder: str | None = None, pool: Pool | PoolGenerator | List[BasePipeline] = Pool.default, runner: AbstractRunner | None = None, storage: BaseStorage | None = None, metrics: List[Metric] | None = None)[source]#

Bases: AutoBase

Automatic pipeline selection via defined or custom pipeline pool.

Note

This class requires auto extension to be installed. Read more about this at installation page.

Initialize Auto class.

Parameters:
  • target_metric (Metric) – Metric to optimize.

  • horizon (int) – Horizon to forecast for.

  • metric_aggregation (Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95']) – Aggregation method for per-segment metrics. By default, mean aggregation is used.

  • backtest_params (dict | None) – Custom parameters for backtest instead of default backtest parameters.

  • experiment_folder (str | None) – Name for saving experiment results, it determines the name for optuna study. By default, isn’t set.

  • pool (Pool | PoolGenerator | List[BasePipeline]) – Pool of pipelines to choose from. By default, default pool from Pool is used.

  • runner (AbstractRunner | None) – Runner to use for distributed training. By default, LocalRunner is used.

  • storage (BaseStorage | None) – Optuna storage to use. By default, sqlite storage is used.

  • metrics (List[Metric] | None) – List of metrics to compute. By default, Sign, SMAPE, MAE, MSE, MedAE metrics are used.

Methods

fit(ts[, timeout, n_trials, initializer, ...])

Start automatic pipeline selection.

objective(ts, target_metric, ...[, ...])

Optuna objective wrapper for the pool stage.

summary()

Get Auto trials summary.

top_k([k])

Get top k pipelines with the best metric value.

fit(ts: TSDataset, timeout: int | None = None, n_trials: int | None = None, initializer: _Initializer | None = None, callback: _Callback | None = None, **kwargs) BasePipeline[source]#

Start automatic pipeline selection.

There are two stages:

  • Pool stage: trying every pipeline in a pool

  • Tuning stage: tuning tune_size best pipelines from a previous stage by using Tune.

Tuning stage starts only if limits on n_trials and timeout aren’t exceeded. Tuning goes from the best pipeline to the worst, and trial limits (n_trials, timeout) are divided evenly between each pipeline. If there are no limits on number of trials only the first pipeline will be tuned until user stops the process.

Parameters:
  • ts (TSDataset) – TSDataset to fit on.

  • timeout (int | None) – Timeout for optuna. N.B. this is timeout for each worker. By default, isn’t set.

  • n_trials (int | None) – Number of trials for optuna. N.B. this is number of trials for each worker. By default, isn’t set.

  • initializer (_Initializer | None) – Object that is called before each pipeline backtest, can be used to initialize loggers.

  • callback (_Callback | None) – Object that is called after each pipeline backtest, can be used to log extra metrics.

  • **kwargs – Parameter tune_size (default: 0) determines how many pipelines to fit during tuning stage. Other parameters are passed into optuna optuna.study.Study.optimize().

Return type:

BasePipeline

static objective(ts: TSDataset, target_metric: Metric, metric_aggregation: Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95'], metrics: List[Metric], backtest_params: dict, initializer: _Initializer | None = None, callback: _Callback | None = None) Callable[[Trial], float][source]#

Optuna objective wrapper for the pool stage.

Parameters:
  • ts (TSDataset) – TSDataset to fit on.

  • target_metric (Metric) – Metric to optimize.

  • metric_aggregation (Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95']) – Aggregation method for per-segment metrics.

  • metrics (List[Metric]) – List of metrics to compute.

  • backtest_params (dict) – Custom parameters for backtest instead of default backtest parameters.

  • initializer (_Initializer | None) – Object that is called before each pipeline backtest, can be used to initialize loggers.

  • callback (_Callback | None) – Object that is called after each pipeline backtest, can be used to log extra metrics.

Returns:

function that runs specified trial and returns its evaluated score

Return type:

objective

summary() DataFrame[source]#

Get Auto trials summary.

There are columns:

  • hash: hash of the pipeline;

  • pipeline: pipeline object;

  • metrics: columns with metrics’ values;

  • state: state of the trial;

  • study: name of the study in which trial was made.

Returns:

dataframe with detailed info on each performed trial

Return type:

study_dataframe

top_k(k: int = 5) List[BasePipeline][source]#

Get top k pipelines with the best metric value.

Only complete and non-duplicate studies are taken into account.

Parameters:

k (int) – Number of pipelines to return.

Returns:

List of top k pipelines.

Return type:

List[BasePipeline]