Time series forecasting is a process, and the only way to get good forecasts is to practice this process.
In this tutorial, you will discover how to forecast the annual water usage in Baltimore with Python.
Working through this tutorial will provide you with a framework for the steps and the tools for working through your own time series forecasting problems.
After completing this tutorial, you will know:
 How to confirm your Python environment and carefully define a time series forecasting problem.
 How to create a test harness for evaluating models, develop a baseline forecast, and better understand your problem with the tools of time series analysis.
 How to develop an autoregressive integrated moving average model, save it to file, and later load it to make predictions for new time steps.
Let’s get started.
Overview
In this tutorial, we will work through a time series forecasting project from endtoend, from downloading the dataset and defining the problem to training a final model and making predictions.
This project is not exhaustive, but shows how you can get good results quickly by working through a time series forecasting problem systematically.
The steps of this project that we will work through are as follows.
 Environment.
 Problem Description.
 Test Harness.
 Persistence.
 Data Analysis.
 ARIMA Models.
 Model Validation.
This will provide a template for working through a time series prediction problem that you can use on your own dataset.
1. Environment
This tutorial assumes an installed and working SciPy environment and dependencies, including:
 SciPy
 NumPy
 Matplotlib
 Pandas
 scikitlearn
 statsmodels
If you need help installing Python and the SciPy environment on your workstation, consider the Anaconda distribution that manages much of it for you.
This script will help you check your installed versions of these libraries.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

# scipy import scipy print(‘scipy: %s’ % scipy.__version__) # numpy import numpy print(‘numpy: %s’ % numpy.__version__) # matplotlib import matplotlib print(‘matplotlib: %s’ % matplotlib.__version__) # pandas import pandas print(‘pandas: %s’ % pandas.__version__) # scikitlearn import sklearn print(‘sklearn: %s’ % sklearn.__version__) # statsmodels import statsmodels print(‘statsmodels: %s’ % statsmodels.__version__) 
The results on my workstation used to write this tutorial are as follows:

scipy: 0.18.1 numpy: 1.11.2 matplotlib: 1.5.3 pandas: 0.19.1 sklearn: 0.18.1 statsmodels: 0.6.1 
2. Problem Description
The problem is to predict annual water usage.
The dataset provides the annual water usage in Baltimore from 1885 to 1963, or 79 years of data.
The values are in the units of liters per capita per day, and there are 79 observations.
The dataset is credited to Hipel and McLeod, 1994.
You can learn more about this dataset and download it directly from DataMarket.
Download the dataset as a CSV file and place it in your current working directory with the filename “water.csv“.
3. Test Harness
We must develop a test harness to investigate the data and evaluate candidate models.
This involves two steps:
 Defining a Validation Dataset.
 Developing a Method for Model Evaluation.
3.1 Validation Dataset
The dataset is not current. This means that we cannot easily collect updated data to validate the model.
Therefore, we will pretend that it is 1953 and withhold the last 10 years of data from analysis and model selection.
This final decade of data will be used to validate the final model.
The code below will load the dataset as a Pandas Series and split into two, one for model development (dataset.csv) and the other for validation (validation.csv).

from pandas import Series series = Series.from_csv(‘water.csv’, header=0) split_point = len(series) – 10 dataset, validation = series[0:split_point], series[split_point:] print(‘Dataset %d, Validation %d’ % (len(dataset), len(validation))) dataset.to_csv(‘dataset.csv’) validation.to_csv(‘validation.csv’) 
Running the example creates two files and prints the number of observations in each.

Dataset 69, Validation 10 
The specific contents of these files are:
 dataset.csv: Observations from 1885 to 1953 (69 observations).
 validation.csv: Observations from 1954 to 1963 (10 observations).
The validation dataset is about 12% of the original dataset.
Note that the saved datasets do not have a header line, therefore we do not need to cater to this when working with these files later.
3.2. Model Evaluation
Model evaluation will only be performed on the data in dataset.csv prepared in the previous section.
Model evaluation involves two elements:
 Performance Measure.
 Test Strategy.
3.2.1 Performance Measure
We will evaluate the performance of predictions using the root mean squared error (RMSE). This will give more weight to predictions that are grossly wrong and will have the same units as the original data.
Any transforms to the data must be reversed before the RMSE is calculated and reported to make the performance between different methods directly comparable.
We can calculate the RMSE using the helper function from the scikitlearn library mean_squared_error() that calculates the mean squared error between a list of expected values (the test set) and the list of predictions. We can then take the square root of this value to give us a RMSE score.
For example:

from sklearn.metrics import mean_squared_error from math import sqrt ... test = ... predictions = ... mse = mean_squared_error(test, predictions) rmse = sqrt(mse) print(‘RMSE: %.3f’ % rmse) 
3.2.2 Test Strategy
Candidate models will be evaluated using walkforward validation.
This is because a rollingforecast type model is required from the problem definition. This is where onestep forecasts are needed given all available data.
The walkforward validation will work as follows:
 The first 50% of the dataset will be held back to train the model.
 The remaining 50% of the dataset will be iterated and test the model.
 For each step in the test dataset:
 A model will be trained.
 A onestep prediction made and the prediction stored for later evaluation.
 The actual observation from the test dataset will be added to the training dataset for the next iteration.
 The predictions made during the enumeration of the test dataset will be evaluated and an RMSE score reported.
Given the small size of the data, we will allow a model to be retrained given all available data prior to each prediction.
We can write the code for the test harness using simple NumPy and Python code.
Firstly, we can split the dataset into train and test sets directly. We’re careful to always convert a loaded dataset to float32 in case the loaded data still has some String or Integer data types.

# prepare data X = series.values X = X.astype(‘float32’) train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:] 
Next, we can iterate over the time steps in the test dataset. The train dataset is stored in a Python list as we need to easily append a new observation each iteration and NumPy array concatenation feels like overkill.
The prediction made by the model is called yhat for convention, as the outcome or observation is referred to as y and yhat (a ‘y‘ with a mark above) is the mathematical notation for the prediction of the y variable.
The prediction and observation are printed each observation for a sanity check prediction in case there are issues with the model.

# walkforward validation history = [x for x in train] predictions = list() for i in range(len(test)): # predict yhat = ... predictions.append(yhat) # observation obs = test[i] history.append(obs) print(‘>Predicted=%.3f, Expected=%3.f’ % (yhat, obs)) 
4. Persistence
The first step before getting bogged down in data analysis and modeling is to establish a baseline of performance.
This will provide both a template for evaluating models using the proposed test harness and a performance measure by which all more elaborate predictive models can be compared.
The baseline prediction for time series forecasting is called the naive forecast, or persistence.
This is where the observation from the previous time step is used as the prediction for the observation at the next time step.
We can plug this directly into the test harness defined in the previous section.
The complete code listing is provided below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

from pandas import Series from sklearn.metrics import mean_squared_error from math import sqrt # load data series = Series.from_csv(‘dataset.csv’) # prepare data X = series.values X = X.astype(‘float32’) train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:] # walkforward validation history = [x for x in train] predictions = list() for i in range(len(test)): # predict yhat = history[–1] predictions.append(yhat) # observation obs = test[i] history.append(obs) print(‘>Predicted=%.3f, Expected=%3.f’ % (yhat, obs)) # report performance mse = mean_squared_error(test, predictions) rmse = sqrt(mse) print(‘RMSE: %.3f’ % rmse) 
Running the test harness prints the prediction and observation for each iteration of the test dataset.
The example ends by printing the RMSE for the model.
In this case, we can see that the persistence model achieved an RMSE of 21.975. This means that on average, the model was wrong by about 22 liters per capita per day for each prediction made.

… >Predicted=613.000, Expected=598 >Predicted=598.000, Expected=575 >Predicted=575.000, Expected=564 >Predicted=564.000, Expected=549 >Predicted=549.000, Expected=538 RMSE: 21.975 
We now have a baseline prediction method and performance; now we can start digging into our data.
5. Data Analysis
We can use summary statistics and plots of the data to quickly learn more about the structure of the prediction problem.
In this section, we will look at the data from four perspectives:
 Summary Statistics.
 Line Plot.
 Density Plots.
 Box and Whisker Plot.
5.1. Summary Statistics
Summary statistics provide a quick look at the limits of observed values. It can help to get a quick idea of what we are working with.
The example below calculates and prints summary statistics for the time series.

from pandas import Series series = Series.from_csv(‘dataset.csv’) print(series.describe()) 
Running the example provides a number of summary statistics to review.
Some observations from these statistics include:
 The number of observations (count) matches our expectation, meaning we are handling the data correctly.
 The mean is about 500, which we might consider our level in this series.
 The standard deviation and percentiles suggest a reasonably tight spread around the mean.

count 69.000000 mean 500.478261 std 73.901685 min 344.000000 25% 458.000000 50% 492.000000 75% 538.000000 max 662.000000 
5.2. Line Plot
A line plot of a time series dataset can provide a lot of insight into the problem.
The example below creates and shows a line plot of the dataset.

from pandas import Series from matplotlib import pyplot series = Series.from_csv(‘dataset.csv’) series.plot() pyplot.show() 
Run the example and review the plot. Note any obvious temporal structures in the series.
Some observations from the plot include:
 There looks to be an increasing trend in water usage over time.
 There do not appear to be any obvious outliers, although there are some large fluctuations.
 There is a downward trend for the last few years of the series.
There may be some benefit in explicitly modeling the trend component and removing it. You may also explore using differencing with one or two levels in order to make the series stationary.
5.3. Density Plot
Reviewing plots of the density of observations can provide further insight into the structure of the data.
The example below creates a histogram and density plot of the observations without any temporal structure.

from pandas import Series from matplotlib import pyplot series = Series.from_csv(‘dataset.csv’) pyplot.figure(1) pyplot.subplot(211) series.hist() pyplot.subplot(212) series.plot(kind=‘kde’) pyplot.show() 
Run the example and review the plots.
Some observations from the plots include:
 The distribution is not Gaussian, but is pretty close.
 The distribution has a long right tail and may suggest an exponential distribution or a double Gaussian.
This suggests it may be worth exploring some power transforms of the data prior to modeling.
5.4. Box and Whisker Plots
We can group the annual data by decade and get an idea of the spread of observations for each decade and how this may be changing.
We do expect to see some trend (increasing mean or median), but it may be interesting to see how the rest of the distribution may be changing.
The example below groups the observations by decade and creates one box and whisker plot for each decade of observations. The last decade only contains 9 years and may not be a useful comparison with the other decades. Therefore only data between 1885 and 1944 was plotted.

from pandas import Series from pandas import DataFrame from pandas import TimeGrouper from matplotlib import pyplot series = Series.from_csv(‘dataset.csv’) groups = series[‘1885’:‘1944’].groupby(TimeGrouper(’10AS’)) decades = DataFrame() for name, group in groups: decades[name.year] = group.values decades.boxplot() pyplot.show() 
Running the example creates 6 box and whisker plots sidebyside, one for the 6 decades of selected data.
Some observations from reviewing the plot include:
 The median values for each year (red line) may show an increasing trend that may not be linear.
 The spread, or middle 50% of the data (blue boxes), does show some variability.
 There maybe outliers in some decades (crosses outside of the box and whiskers).
 The second to last decade seems to have a lower average consumption, perhaps related to the first world war.
This yearly view of the data is an interesting avenue and could be pursued further by looking at summary statistics from decadetodecade and changes in summary statistics.
6. ARIMA Models
In this section, we will develop Autoregressive Integrated Moving Average or ARIMA models for the problem.
We will approach modeling by both manual and automatic configuration of the ARIMA model. This will be followed by a third step of investigating the residual errors of the chosen model.
As such, this section is broken down into 3 steps:
 Manually Configure the ARIMA.
 Automatically Configure the ARIMA.
 Review Residual Errors.
6.1 Manually Configured ARIMA
The ARIMA(p,d,q) model requires three parameters and is traditionally configured manually.
Analysis of the time series data assumes that we are working with a stationary time series.
The time series is likely nonstationary. We can make it stationary by first differencing the series and using a statistical test to confirm that the result is stationary.
The example below creates a stationary version of the series and saves it to file stationary.csv.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

from pandas import Series from statsmodels.tsa.stattools import adfuller from matplotlib import pyplot
# create a differe def difference(dataset): diff = list() for i in range(1, len(dataset)): value = dataset[i] – dataset[i – 1] diff.append(value) return Series(diff)
series = Series.from_csv(‘dataset.csv’) X = series.values X = X.astype(‘float32’) # difference data stationary = difference(X) stationary.index = series.index[1:] # check if stationary result = adfuller(stationary) print(‘ADF Statistic: %f’ % result[0]) print(‘pvalue: %f’ % result[1]) print(‘Critical Values:’) for key, value in result[4].items(): print(‘t%s: %.3f’ % (key, value)) # plot differenced data stationary.plot() pyplot.show() # save stationary.to_csv(‘stationary.csv’) 
Running the example outputs the result of a statistical significance test of whether the differenced series is stationary. Specifically, the augmented DickeyFuller test.
The results show that the test statistic value 6.126719 is smaller than the critical value at 1% of 3.534. This suggests that we can reject the null hypothesis with a significance level of less than 1% (i.e. a low probability that the result is a statistical fluke).
Rejecting the null hypothesis means that the process has no unit root, and in turn that the time series is stationary or does not have timedependent structure.

ADF Statistic: 6.126719 pvalue: 0.000000 Critical Values: 5%: 2.906 1%: 3.534 10%: 2.591 
This suggests that at least one level of differencing is required. The d parameter in our ARIMA model should at least be a value of 1.
A plot of the differenced data is also created. It suggests that this has indeed removed the increasing trend.
The next first step is to select the lag values for the Autoregression (AR) and Moving Average (MA) parameters, p and q respectively.
We can do this by reviewing Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots.
The example below creates ACF and PACF plots for the series.

from pandas import Series from statsmodels.graphics.tsaplots import plot_acf from statsmodels.graphics.tsaplots import plot_pacf from matplotlib import pyplot series = Series.from_csv(‘dataset.csv’) pyplot.figure() pyplot.subplot(211) plot_acf(series, ax=pyplot.gca()) pyplot.subplot(212) plot_pacf(series, ax=pyplot.gca()) pyplot.show() 
Run the example and review the plots for insights into how to set the p and q variables for the ARIMA model.
Below are some observations from the plots.
 The ACF shows no significant lags.
 The PACF also shows no significant lags.
A good starting point for the p and q values is also 0.
This quick analysis suggests an ARIMA(0,1,0) on the raw data may be a good starting point.
This is in fact a persistence model. The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

from pandas import Series from sklearn.metrics import mean_squared_error from statsmodels.tsa.arima_model import ARIMA from math import sqrt # load data series = Series.from_csv(‘dataset.csv’) # prepare data X = series.values X = X.astype(‘float32’) train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:] # walkforward validation history = [x for x in train] predictions = list() for i in range(len(test)): # predict model = ARIMA(history, order=(0,1,0)) model_fit = model.fit(disp=0) yhat = model_fit.forecast()[0] predictions.append(yhat) # observation obs = test[i] history.append(obs) print(‘>Predicted=%.3f, Expected=%3.f’ % (yhat, obs)) # report performance mse = mean_squared_error(test, predictions) rmse = sqrt(mse) print(‘RMSE: %.3f’ % rmse) 
Running this example results in an RMSE of 22.311, which is slightly higher than the persistence model above.
This may be because of the details of the ARIMA implementation, such as an automatic trend constant that is calculated and added.

… >Predicted=617.079, Expected=598 >Predicted=601.781, Expected=575 >Predicted=578.369, Expected=564 >Predicted=567.152, Expected=549 >Predicted=551.881, Expected=538 RMSE: 22.311 
6.2 Grid Search ARIMA Hyperparameters
The ACF and PACF plots suggest that we cannot do better than a persistence model on this dataset.
To confirm this analysis, we can grid search a suite of ARIMA hyperparameters and check that no models result in better out of sample RMSE performance.
In this section, we will search values of p, d, and q for combinations (skipping those that fail to converge), and find the combination that results in the best performance. We will use a grid search to explore all combinations in a subset of integer values.
Specifically, we will search all combinations of the following parameters:
 p: 0 to 4.
 d: 0 to 2.
 q: 0 to 4.
This is (5 * 3 * 5), or 300 potential runs of the test harness, and will take some time to execute.
We will also disable the automatic addition of a trend constant from the model by setting the ‘trend‘ argument to ‘nc‘ for no constant when calling fit().
The complete worked example with the grid search version of the test harness is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

import warnings from pandas import Series from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt
# evaluate an ARIMA model for a given order (p,d,q) and return RMSE def evaluate_arima_model(X, arima_order): # prepare training dataset X = X.astype(‘float32’) train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:] history = [x for x in train] # make predictions predictions = list() for t in range(len(test)): model = ARIMA(history, order=arima_order) # model_fit = model.fit(disp=0) model_fit = model.fit(trend=‘nc’, disp=0) yhat = model_fit.forecast()[0] predictions.append(yhat) history.append(test[t]) # calculate out of sample error mse = mean_squared_error(test, predictions) rmse = sqrt(mse) return rmse
# evaluate combinations of p, d and q values for an ARIMA model def evaluate_models(dataset, p_values, d_values, q_values): dataset = dataset.astype(‘float32’) best_score, best_cfg = float(“inf”), None for p in p_values: for d in d_values: for q in q_values: order = (p,d,q) try: mse = evaluate_arima_model(dataset, order) if mse < best_score: best_score, best_cfg = mse, order print(‘ARIMA%s RMSE=%.3f’ % (order,mse)) except: continue print(‘Best ARIMA%s RMSE=%.3f’ % (best_cfg, best_score))
# load dataset series = Series.from_csv(‘dataset.csv’) # evaluate parameters p_values = range(0, 5) d_values = range(0, 3) q_values = range(0, 5) warnings.filterwarnings(“ignore”) evaluate_models(series.values, p_values, d_values, q_values) 
Running the example runs through all combinations and reports the results on those that converge without error. The example takes a little over 2 minutes to run on modern hardware.
The results show that the best configuration discovered was ARIMA(2, 1, 0) with an RMSE of 21.733, slightly lower than the manual persistence model tested earlier, but may or may not be significantly different.

… ARIMA(4, 1, 0) RMSE=24.802 ARIMA(4, 1, 1) RMSE=25.103 ARIMA(4, 2, 0) RMSE=27.089 ARIMA(4, 2, 1) RMSE=25.932 ARIMA(4, 2, 2) RMSE=25.418 Best ARIMA(2, 1, 0) RMSE=21.733 
We will select this ARIMA(2, 1, 0) model going forward.
6.3 Review Residual Errors
A good final check of a model is to review residual forecast errors.
Ideally, the distribution of residual errors should be a Gaussian with a zero mean.
We can check this by using summary statistics and plots to investigate the residual errors from the ARIMA(2, 1, 0) model. The example below calculates and summarizes the residual forecast errors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

from pandas import Series from pandas import DataFrame from sklearn.metrics import mean_squared_error from statsmodels.tsa.arima_model import ARIMA from math import sqrt from matplotlib import pyplot # load data series = Series.from_csv(‘dataset.csv’) # prepare data X = series.values X = X.astype(‘float32’) train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:] # walkforward validation history = [x for x in train] predictions = list() for i in range(len(test)): # predict model = ARIMA(history, order=(2,1,0)) model_fit = model.fit(trend=‘nc’, disp=0) yhat = model_fit.forecast()[0] predictions.append(yhat) # observation obs = test[i] history.append(obs) # errors residuals = [test[i]–predictions[i] for i in range(len(test))] residuals = DataFrame(residuals) print(residuals.describe()) pyplot.figure() pyplot.subplot(211) residuals.hist(ax=pyplot.gca()) pyplot.subplot(212) residuals.plot(kind=‘kde’, ax=pyplot.gca()) pyplot.show() 
Running the example first describes the distribution of the residuals.
We can see that the distribution has a right shift and that the mean is nonzero at 1.081624.
This is perhaps a sign that the predictions are biased.

count 35.000000 mean 1.081624 std 22.022566 min 52.103811 25% 16.202283 50% 0.459801 75% 12.085091 max 51.284336 
The distribution of residual errors is also plotted.
The graphs suggest a Gaussianlike distribution with a longer right tail, providing further evidence that perhaps a power transform might be worth exploring.
We could use this information to biascorrect predictions by adding the mean residual error of 1.081624 to each forecast made.
The example below performs this biascorrection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

from pandas import Series from pandas import DataFrame from sklearn.metrics import mean_squared_error from statsmodels.tsa.arima_model import ARIMA from math import sqrt from matplotlib import pyplot # load data series = Series.from_csv(‘dataset.csv’) # prepare data X = series.values X = X.astype(‘float32’) train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:] # walkforward validation history = [x for x in train] predictions = list() bias = 1.081624 for i in range(len(test)): # predict model = ARIMA(history, order=(2,1,0)) model_fit = model.fit(trend=‘nc’, disp=0) yhat = bias + float(model_fit.forecast()[0]) predictions.append(yhat) # observation obs = test[i] history.append(obs) # report performance mse = mean_squared_error(test, predictions) rmse = sqrt(mse) print(‘RMSE: %.3f’ % rmse) # summarise residual errors residuals = [test[i]–predictions[i] for i in range(len(test))] residuals = DataFrame(residuals) print(residuals.describe()) # plot residual errors pyplot.figure() pyplot.subplot(211) residuals.hist(ax=pyplot.gca()) pyplot.subplot(212) residuals.plot(kind=‘kde’, ax=pyplot.gca()) pyplot.show() 
The performance of the predictions is improved very slightly from 21.733 to 21.706, which may or may not be significant.
The summary of the forecast residual errors shows that the mean was indeed moved to a value very close to zero.

RMSE: 21.706 0 count 3.500000e+01 mean 3.537544e07 std 2.202257e+01 min 5.318543e+01 25% 1.728391e+01 50% 1.541425e+00 75% 1.100347e+01 max 5.020271e+01 
Finally, density plots of the residual error do show a small shift towards zero.
It is debatable whether this bias correction is worth it, but we will use it for now.
7. Model Validation
After models have been developed and a final model selected, it must be validated and finalized.
Validation is an optional part of the process, but one that provides a ‘last check’ to ensure we have not fooled or misled ourselves.
This section includes the following steps:
 Finalize Model: Train and save the final model.
 Make Prediction: Load the finalized model and make a prediction.
 Validate Model: Load and validate the final model.
7.1 Finalize Model
Finalizing the model involves fitting an ARIMA model on the entire dataset, in this case, on a transformed version of the entire dataset.
Once fit, the model can be saved to file for later use.
The example below trains an ARIMA(2,1,0) model on the dataset and saves the whole fit object and the bias to file.
There is a bug in the current stable version of the statsmodels library (v0.6.1) that results in an error when you try to load a saved ARIMA model from file. The error reported is:

TypeError: __new__() takes at least 3 arguments (1 given) 
This bug also seems present in the 0.8 release candidate 1 of statsmodels when I tested it. For more details, see Zae Myung Kim‘s discussion and fix of this GitHub issue.
We can work around this with a monkey patch that adds a __getnewargs__() instance function to the ARIMA class before saving.
The example below saves the fit model to file in the correct state so that it can be loaded successfully later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

from pandas import Series from statsmodels.tsa.arima_model import ARIMA from scipy.stats import boxcox import numpy
# monkey patch around bug in ARIMA class def __getnewargs__(self): return ((self.endog),(self.k_lags, self.k_diff, self.k_ma))
ARIMA.__getnewargs__ = __getnewargs__
# load data series = Series.from_csv(‘dataset.csv’) # prepare data X = series.values X = X.astype(‘float32’) # fit model model = ARIMA(X, order=(2,1,0)) model_fit = model.fit(trend=‘nc’, disp=0) # bias constant, could be calculated from insample mean residual bias = 1.081624 # save model model_fit.save(‘model.pkl’) numpy.save(‘model_bias.npy’, [bias]) 
Running the example creates two local files:
 model.pkl This is the ARIMAResult object from the call to ARIMA.fit(). This includes the coefficients and all other internal data returned when fitting the model.
 model_bias.npy This is the bias value stored as a onerow, onecolumn NumPy array.
7.2 Make Prediction
A natural case may be to load the model and make a single forecast.
This is relatively straightforward and involves restoring the saved model and the bias and calling the forecast() function.
The example below loads the model, makes a prediction for the next time step, and prints the prediction.

from pandas import Series from statsmodels.tsa.arima_model import ARIMAResults import numpy model_fit = ARIMAResults.load(‘model.pkl’) bias = numpy.load(‘model_bias.npy’) yhat = bias + float(model_fit.forecast()[0]) print(‘Predicted: %.3f’ % yhat) 
Running the example prints the prediction of about 540.
If we peek inside validation.csv, we can see that the value on the first row for the next time period is 568. The prediction is in the right ballpark.
7.3 Validate Model
We can load the model and use it in a pretend operational manner.
In the test harness section, we saved the final 10 years of the original dataset in a separate file to validate the final model.
We can load this validation.csv file now and use it to see how well our model really is on “unseen” data.
There are two ways we might proceed:
 Load the model and use it to forecast the next 10 years. The forecast beyond the first one or two years will quickly start to degrade in skill.
 Load the model and use it in a rollingforecast manner, updating the transform and model for each time step. This is the preferred method as it is how one would use this model in practice as it would achieve the best performance.
As with model evaluation in the previous sections, we will make predictions in a rollingforecast manner. This means that we will step over lead times in the validation dataset and take the observations as an update to the history.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

from pandas import Series from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from statsmodels.tsa.arima_model import ARIMAResults from sklearn.metrics import mean_squared_error from math import sqrt import numpy # load and prepare datasets dataset = Series.from_csv(‘dataset.csv’) X = dataset.values.astype(‘float32’) history = [x for x in X] validation = Series.from_csv(‘validation.csv’) y = validation.values.astype(‘float32’) # load model model_fit = ARIMAResults.load(‘model.pkl’) bias = numpy.load(‘model_bias.npy’) # make first prediction predictions = list() yhat = bias + float(model_fit.forecast()[0]) predictions.append(yhat) history.append(y[0]) print(‘>Predicted=%.3f, Expected=%3.f’ % (yhat, y[0])) # rolling forecasts for i in range(1, len(y)): # predict model = ARIMA(history, order=(2,1,0)) model_fit = model.fit(trend=‘nc’, disp=0) yhat = bias + float(model_fit.forecast()[0]) predictions.append(yhat) # observation obs = y[i] history.append(obs) print(‘>Predicted=%.3f, Expected=%3.f’ % (yhat, obs)) # report performance mse = mean_squared_error(y, predictions) rmse = sqrt(mse) print(‘RMSE: %.3f’ % rmse) pyplot.plot(y) pyplot.plot(predictions, color=‘red’) pyplot.show() 
Running the example prints each prediction and expected value for the time steps in the validation dataset.
The final RMSE for the validation period is predicted at 16 liters per capita per day. This is not too different from the expected error of 21, but I would expect that it is also not too different from a simple persistence model.

>Predicted=540.013, Expected=568 >Predicted=571.589, Expected=575 >Predicted=573.289, Expected=579 >Predicted=579.561, Expected=587 >Predicted=588.063, Expected=602 >Predicted=603.022, Expected=594 >Predicted=593.178, Expected=587 >Predicted=588.558, Expected=587 >Predicted=588.797, Expected=625 >Predicted=627.941, Expected=613 RMSE: 16.532 
A plot of the predictions compared to the validation dataset is also provided.
The forecast does have the characteristics of a persistence forecast. This suggests that although this time series does have an obvious trend, it is still a reasonably difficult problem.
Summary
In this tutorial, you discovered the steps and the tools for a time series forecasting project with Python.
We covered a lot of ground in this tutorial; specifically:
 How to develop a test harness with a performance measure and evaluation method and how to quickly develop a baseline forecast and skill.
 How to use time series analysis to raise ideas for how to best model the forecast problem.
 How to develop an ARIMA model, save it, and later load it to make predictions on new data.
How did you do? Do you have any questions about this tutorial?
Ask your questions in the comments below and I will do my best to answer.