The data for your sequence prediction problem probably needs to be scaled when training a neural network, such as a Long ShortTerm Memory recurrent neural network.
When a network is fit on unscaled data that has a range of values (e.g. quantities in the 10s to 100s) it is possible for large inputs to slow down the learning and convergence of your network and in some cases prevent the network from effectively learning your problem.
In this tutorial, you will discover how to normalize and standardize your sequence prediction data and how to decide which to use for your input and output variables.
After completing this tutorial, you will know:
 How to normalize and standardize sequence data in Python.
 How to select the appropriate scaling for input and output variables.
 Practical considerations when scaling sequence data.
Let’s get started.
Tutorial Overview
This tutorial is divided into 4 parts; they are:
 Scaling Series Data
 Scaling Input Variables
 Scaling Output Variables
 Practical Considerations When Scaling
Scaling Series Data in Python
There are two types of scaling of your series that you may want to consider: normalization and standardization.
These can both be achieved using the scikitlearn library.
Normalize Series Data
Normalization is a rescaling of the data from the original range so that all values are within the range of 0 and 1.
Normalization requires that you know or are able to accurately estimate the minimum and maximum observable values. You may be able to estimate these values from your available data. If your time series is trending up or down, estimating these expected values may be difficult and normalization may not be the best method to use on your problem.
A value is normalized as follows:

y = (x – min) / (max – min) 
Where the minimum and maximum values pertain to the value x being normalized.
For example, for a dataset, we could guesstimate the min and max observable values as 30 and 10. We can then normalize any value, like 18.8, as follows:

y = (x – min) / (max – min) y = (18.8 – (10)) / (30 – (10)) y = 28.8 / 40 y = 0.72 
You can see that if an x value is provided that is outside the bounds of the minimum and maximum values, that the resulting value will not be in the range of 0 and 1. You could check for these observations prior to making predictions and either remove them from the dataset or limit them to the predefined maximum or minimum values.
You can normalize your dataset using the scikitlearn object MinMaxScaler.
Good practice usage with the MinMaxScaler and other scaling techniques is as follows:
 Fit the scaler using available training data. For normalization, this means the training data will be used to estimate the minimum and maximum observable values. This is done by calling the fit() function.
 Apply the scale to training data. This means you can use the normalized data to train your model. This is done by calling the transform() function.
 Apply the scale to data going forward. This means you can prepare new data in the future on which you want to make predictions.
If needed, the transform can be inverted. This is useful for converting predictions back into their original scale for reporting or plotting. This can be done by calling the inverse_transform() function.
Below is an example of normalizing a contrived sequence of 10 quantities.
The scaler object requires data to be provided as a matrix of rows and columns. The loaded time series data is loaded as a Pandas Series.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

from pandas import Series from sklearn.preprocessing import MinMaxScaler # define contrived series data = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0] series = Series(data) print(series) # prepare data for normalization values = series.values values = values.reshape((len(values), 1)) # train the normalization scaler = MinMaxScaler(feature_range=(0, 1)) scaler = scaler.fit(values) print(‘Min: %f, Max: %f’ % (scaler.data_min_, scaler.data_max_)) # normalize the dataset and print normalized = scaler.transform(values) print(normalized) # inverse transform and print inversed = scaler.inverse_transform(normalized) print(inversed) 
Running the example prints the sequence, prints the min and max values estimated from the sequence, prints the same normalized sequence, then the values back in their original scale using the inverse transform.
We can also see that the minimum and maximum values of the dataset are 10.0 and 100.0 respectively.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

0 10.0 1 20.0 2 30.0 3 40.0 4 50.0 5 60.0 6 70.0 7 80.0 8 90.0 9 100.0
Min: 10.000000, Max: 100.000000
[[ 0. ] [ 0.11111111] [ 0.22222222] [ 0.33333333] [ 0.44444444] [ 0.55555556] [ 0.66666667] [ 0.77777778] [ 0.88888889] [ 1. ]]
[[ 10.] [ 20.] [ 30.] [ 40.] [ 50.] [ 60.] [ 70.] [ 80.] [ 90.] [ 100.]] 
Standardize Series Data
Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1.
This can be thought of as subtracting the mean value or centering the data.
Like normalization, standardization can be useful, and even required in some machine learning algorithms when your data has input values with differing scales.
Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well behaved mean and standard deviation. You can still standardize your time series data if this expectation is not met, but you may not get reliable results.
Standardization requires that you know or are able to accurately estimate the mean and standard deviation of observable values. You may be able to estimate these values from your training data.
A value is standardized as follows:

y = (x – mean) / standard_deviation 
Where the mean is calculated as:
And the standard_deviation is calculated as:

standard_deviation = sqrt( sum( (x – mean)^2 ) / count(x)) 
We can guesstimate a mean of 10 and a standard deviation of about 5. Using these values, we can standardize the first value of 20.7 as follows:

y = (x – mean) / standard_deviation y = (20.7 – 10) / 5 y = (10.7) / 5 y = 2.14 
The mean and standard deviation estimates of a dataset can be more robust to new data than the minimum and maximum.
You can standardize your dataset using the scikitlearn object StandardScaler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

from pandas import Series from sklearn.preprocessing import StandardScaler from math import sqrt # define contrived series data = [1.0, 5.5, 9.0, 2.6, 8.8, 3.0, 4.1, 7.9, 6.3] series = Series(data) print(series) # prepare data for normalization values = series.values values = values.reshape((len(values), 1)) # train the normalization scaler = StandardScaler() scaler = scaler.fit(values) print(‘Mean: %f, StandardDeviation: %f’ % (scaler.mean_, sqrt(scaler.var_))) # normalize the dataset and print standardized = scaler.transform(values) print(standardized) # inverse transform and print inversed = scaler.inverse_transform(standardized) print(inversed) 
Running the example prints the sequence, prints the mean and standard deviation estimated from the sequence, prints the standardized values, then prints the values back in their original scale.
We can see that the estimated mean and standard deviation were about 5.3 and 2.7 respectively.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0 1.0 1 5.5 2 9.0 3 2.6 4 8.8 5 3.0 6 4.1 7 7.9 8 6.3
Mean: 5.355556, StandardDeviation: 2.712568
[[1.60569456] [ 0.05325007] [ 1.34354035] [1.01584758] [ 1.26980948] [0.86838584] [0.46286604] [ 0.93802055] [ 0.34817357]]
[[ 1. ] [ 5.5] [ 9. ] [ 2.6] [ 8.8] [ 3. ] [ 4.1] [ 7.9] [ 6.3]] 
Scaling Input Variables
The input variables are those that the network takes on the input or visible layer in order to make a prediction.
A good rule of thumb is that input variables should be small values, probably in the range of 01 or standardized with a zero mean and a standard deviation of one.
Whether input variables require scaling depends on the specifics of your problem and of each variable. Let’s look at some examples.
Categorical Inputs
You may have a sequence of categorical inputs, such as letters or statuses.
Generally, categorical inputs are first integer encoded then one hot encoded. That is, a unique integer value is assigned to each distinct possible input, then a binary vector of ones and zeros is used to represent each integer value.
By definition, a one hot encoding will ensure that each input is a small real value, in this case 0.0 or 1.0.
RealValued Inputs
You may have a sequence of quantities as inputs, such as prices or temperatures.
If the distribution of the quantity is normal, then it should be standardized, otherwise the series should be normalized. This applies if the range of quantity values is large (10s 100s, etc.) or small (0.01, 0.0001).
If the quantity values are small (near 01) and the distribution is limited (e.g. standard deviation near 1) then perhaps you can get away with no scaling of the series.
Other Inputs
Problems can be complex and it may not be clear how to best scale input data.
If in doubt, normalize the input sequence. If you have the resources, explore modeling with the raw data, standardized data, and normalized and see if there is a beneficial difference.
If the input variables are combined linearly, as in an MLP [Multilayer Perceptron], then it is rarely strictly necessary to standardize the inputs, at least in theory. … However, there are a variety of practical reasons why standardizing the inputs can make training faster and reduce the chances of getting stuck in local optima.
— Should I normalize/standardize/rescale the data? Neural Nets FAQ
Scaling Output Variables
The output variable is the variable predicted by the network.
You must ensure that the scale of your output variable matches the scale of the activation function (transfer function) on the output layer of your network.
If your output activation function has a range of [0,1], then obviously you must ensure that the target values lie within that range. But it is generally better to choose an output activation function suited to the distribution of the targets than to force your data to conform to the output activation function.
— Should I normalize/standardize/rescale the data? Neural Nets FAQ
The following heuristics should cover most sequence prediction problems:
Binary Classification Problem
If your problem is a binary classification problem, then the output will be class values 0 and 1. This is best modeled with a sigmoid activation function on the output layer. Output values will be real values between 0 and 1 that can be snapped to crisp values.
Multiclass Classification Problem
If your problem is a multiclass classification problem, then the output will be a vector of binary class values between 0 and 1, one output per class value. This is best modeled with a softmax activation function on the output layer. Again, output values will be real values between 0 and 1 that can be snapped to crisp values.
Regression Problem
If your problem is a regression problem, then the output will be a real value. This is best modeled with a linear activation function. If the distribution of the value is normal, then you can standardize the output variable. Otherwise, the output variable can be normalized.
Other Problem
There are many other activation functions that may be used on the output layer and the specifics of your problem may add confusion.
The rule of thumb is to ensure that the network outputs match the scale of your data.
Practical Considerations When Scaling
There are some practical considerations when scaling sequence data.
 Estimate Coefficients. You can estimate coefficients (min and max values for normalization or mean and standard deviation for standardization) from the training data. Inspect these firstcut estimates and use domain knowledge or domain experts to help improve these estimates so that they will be usefully correct on all data in the future.
 Save Coefficients. You will need to normalize new data in the future in exactly the same way as the data used to train your model. Save the coefficients used to file and load them later when you need to scale new data when making predictions.
 Data Analysis. Use data analysis to help you better understand your data. For example, a simple histogram can help you quickly get a feeling for the distribution of quantities to see if standardization would make sense.
 Scale Each Series. If your problem has multiple series, treat each as a separate variable and in turn scale them separately.
 Scale At The Right Time. It is important to apply any scaling transforms at the right time. For example, if you have a series of quantities that is nonstationary, it may be appropriate to scale after first making your data stationary. It would not be appropriate to scale the series after it has been transformed into a supervised learning problem as each column would be handled differently, which would be incorrect.
 Scale if in Doubt. You probably do need to rescale your input and output variables. If in doubt, at least normalize your data.
Further Reading
This section lists some additional resources to consider when scaling.
Summary
In this tutorial, you discovered how to scale your sequence prediction data when working with Long ShortTerm Memory recurrent neural networks.
Specifically, you learned:
 How to normalize and standardize sequence data in Python.
 How to select the appropriate scaling for input and output variables.
 Practical considerations when scaling sequence data.
Do you have any questions about scaling sequence prediction data?
Ask your question in the comments and I will do my best to answer.