How do you predict multiple time series at once?

Predicting multiple time series at once is a common task in many real-world applications. Companies often want to forecast sales, website traffic, stock prices and other metrics simultaneously. There are several key considerations when developing multi-time series forecasting models.

What is multi-time series forecasting?

Multi-time series forecasting involves predicting the future values of two or more related time series simultaneously. The goal is to leverage the relationships and patterns between the series to improve forecast accuracy. For example, a retailer may want to forecast weekly sales for multiple products at once, taking into account product seasonality, promotions, and effects between related products.

Why predict time series together?

There are several key advantages to modeling time series together rather than independently:

Captures relationships between series – Covariation, lagged effects, and other inter-series dynamics can be modeled.
Improves accuracy – Related time series contain useful information that can improve forecasting accuracy.
Provides coherent forecasts – Ensures forecasts for related series are coherent and consistent with each other.

Efficient modeling – Avoid duplicating work by modeling series together.

In summary, jointly modeling time series allows the model to learn from patterns across the data, improving accuracy and ensuring coherent forecasts.

Challenges of multi-time series forecasting

While forecasting multiple interrelated time series has advantages, it also comes with some unique modeling challenges:

Complexity – Modeling many series jointly increases dimensionality and complexity.
Computational resources – Model estimation and prediction requires more computational time and resources.
Overfitting – With more parameters, overfitting models to noise is a risk.

Interpretability – Understanding drivers and dynamics across many series can be difficult.

Careful model specification, regularization, and validation helps mitigate these issues when developing multi-time series models.

Data preparation

As with any machine learning task, adequate data preparation and feature engineering is crucial for multi-time series forecasting. Important considerations include:

Handling missing data – Impute, interpolate, or discard missing observations.
Handling uneven lengths – Pad or truncate time series to equal length.
Normalizing scale – Rescale data to comparable numeric ranges.

Detrending – Remove trend and seasonality to stationarize the data.
Lag features – Add lagged observations as predictor variables.
Rolling statistics – Calculate rolling means, variances, etc. as features.

Proper data formatting, feature selection, and feature engineering helps ensure the model can effectively learn across the multiple time series.

Classical statistical models

Classical statistical methods like VAR and transfer functions have traditionally been used for multi-time series forecasting:

Vector autoregression (VAR) – Predicts future values of multiple series based on lagged values of all series. Captures inter-series dynamics through a lagged linear combination.

Transfer functions – Models relationship between input and output series via lagged effects, trends, and seasonality components.

These methods estimate interpretable parameters but rely on rigid assumptions of linearity and stationarity.

Example VAR Model

Time Series	Lagged Values
Sales	Sales(-1), Web Traffic(-1)
Web Traffic	Sales(-1), Web Traffic(-1)

The VAR model above uses lagged values of both time series as predictors for each series.

Machine learning models

Modern machine learning models like RNNs and CNNs are well-suited for multi-time series forecasting because they can learn complex nonlinear relationships:

Recurrent neural networks (RNN) – RNNs like LSTMs can model long-term temporal dependencies and relationships between series.
Convolutional neural networks (CNN) – CNNs can identify local patterns and relationships across multiple input time series.

Attention mechanisms – Attention layers allow models to learn which series and time steps are most related.

Deep learning models require more data and computing resources but can outperform classical models by learning complex inter-series dynamics.

Example RNN Model

Time Step	Input Series	Network Layers
t	Sales(t), Web Traffic(t)	LSTM -> Dense
t+1	Sales(t+1), Web Traffic(t+1)	LSTM -> Dense

This RNN takes lagged observations from multiple time series as input at each time step and passes them through LSTM and dense layers to output a prediction.

Model training

When training multi-time series models, the following techniques help guard against overfitting and improve generalizability:

Cross-validation – Use rolling origin or fixed origin CV to assess out-of-sample performance.
Regularization – Penalize model complexity via L1/L2 regularization, early stopping, dropout.

Ensemble modeling – Combine multiple models to improve robustness.

Tuning hyperparameters and model structure is also crucial to balance model flexibility, training time, and generalization performance.

Prediction intervals

In addition to point forecasts, prediction intervals provide a range that future observations are expected to fall within with some probability. Intervals account for uncertainty and provide actionable insights for decision making. With neural networks, intervals can be constructed using:

Quantile loss – Directly optimizes quantile predictions as an output.
Bootstrapping – Trains on resampled instances and aggregates intervals.
Prediction distribution – Models aleatoric and epistemic uncertainty.

Prediction intervals give a sense for the range of plausible values, which is useful for assessing forecast confidence.

Conclusion

In summary, multi-time series forecasting leverages relationships across series, but requires thoughtful data preparation, model specification, training procedures, and uncertainty estimation. Classical statistical models capture interpretable linear effects. Machine learning models like RNNs and CNNs can learn complex nonlinear spatio-temporal dynamics. With proper implementation, multi-time series forecasts can significantly improve predictive accuracy across many real-world problems.