Sequence prediction is a problem that involves using historical sequence information to predict the next value or values in the sequence.

The sequence may be symbols like letters in a sentence or real values like those in a time series of prices. Sequence prediction may be easiest to understand in the context of time series forecasting as the problem is already generally understood.

In this post, you will discover the standard sequence prediction models that you can use to frame your own sequence prediction problems.

After reading this post, you will know:

- How sequence prediction problems are modeled with recurrent neural networks.
- The 4 standard sequence prediction models used by recurrent neural networks.
- The 2 most common misunderstandings made by beginners when applying sequence prediction models.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 4 parts; they are:

- Sequence Prediction with Recurrent Neural Networks
- Models for Sequence Prediction
- Cardinality from Timesteps not Features
- Two Common Misunderstandings by Practitioners

## Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks, like Long Short-Term Memory (LSTM) networks, are designed for sequence prediction problems.

In fact, at the time of writing, LSTMs achieve state-of-the-art results in challenging sequence prediction problems like neural machine translation (translating English to French).

LSTMs work by learning a function (f(…)) that maps input sequence values (X) onto output sequence values (y).

The learned mapping function is static and may be thought of as a program that takes input variables and uses internal variables. Internal variables are represented by an internal state maintained by the network and built up or accumulated over each value in the input sequence.

… RNNs combine the input vector with their state vector with a fixed (but learned) function to produce a new state vector. This can in programming terms be interpreted as running a fixed program with certain inputs and some internal variables.

— Andrej Karpathy, The Unreasonable Effectiveness of Recurrent Neural Networks, 2015

The static mapping function may be defined with a different number of inputs or outputs, as we will review in the next section.

## Models for Sequence Prediction

In this section, will review the 4 primary models for sequence prediction.

We will use the following terminology:

- X: The input sequence value, may be delimited by a time step, e.g. X(1).
- u: The hidden state value, may be delimited by a time step, e.g. u(1).
- y: The output sequence value, may be delimited by a time step, e.g. y(1).

### One-to-One Model

A one-to-one model produces one output value for each input value.

The internal state for the first time step is zero; from that point onward, the internal state is accumulated over the prior time steps.

In the case of a sequence prediction, this model would produce one time step forecast for each observed time step received as input.

This is a poor use for RNNs as the model has no chance to learn over input or output time steps (e.g. BPTT). If you find implementing this model for sequence prediction, you may intend to be using a many-to-one model instead.

### One-to-Many Model

A one-to-many model produces multiple output values for one input value.

The internal state is accumulated as each value in the output sequence is produced.

This model can be used for image captioning where one image is provided as input and a sequence of words are generated as output.

### Many-to-One Model

A many-to-one model produces one output value after receiving multiple input values.

The internal state is accumulated with each input value before a final output value is produced.

In the case of time series, this model would use a sequence of recent observations to forecast the next time step. This architecture would represent the classical autoregressive time series model.

### Many-to-Many Model

A many-to-many model produces multiple outputs after receiving multiple input values.

As with the many-to-one case, state is accumulated until the first output is created, but in this case multiple time steps are output.

Importantly, the number of input time steps do not have to match the number of output time steps. Think of the input and output time steps operating at different rates.

In the case of time series forecasting, this model would use a sequence of recent observations to make a multi-step forecast.

In a sense, it combines the capabilities of the many-to-one and one-to-many models.

## Cardinality from Timesteps (not Features!)

A common point of confusion is to conflate the above examples of sequence mapping models with multiple input and output features.

A sequence may be comprised of single values, one for each time step.

Alternately, a sequence could just as easily represent a vector of multiple observations at the time step. Each item in the vector for a time step may be thought of as its own separate time series. It does not affect the description of the models above.

For example, a model that takes as input one time step of temperature and pressure and predicts one time step of temperature and pressure is a one-to-one model, not a many-to-many model.

The model does take two values as input and predicts two values, but there is only a single sequence time step expressed for the input and predicted as output.

The cardinality of the sequence prediction models defined above refers to time steps, not features (e.g. univariate or multivariate sequences).

## Two Common Misunderstandings by Practitioners

The confusion of features vs time steps leads to two main misunderstandings when implementing recurrent neural networks by practitioners:

**1. Timesteps as Input Features**

Observations at previous timesteps are framed as input features to the model.

This is the classical fixed-window-based approach of inputting sequence prediction problems used by multilayer Perceptrons. Instead, the sequence should be fed in one time step at a time.

This confusion may lead you to think you have implemented a many-to-one or many-to-many sequence prediction model when in fact you only have a single vector input for one time step.

**2. Timesteps as Output Features**

Predictions at multiple future time steps are framed as output features to the model.

This is the classical fixed-window approach of making multi-step predictions used by multilayer Perceptrons and other machine learning algorithms. Instead, the sequence predictions should be generated one time step at a time.

This confusion may lead you to think you have implemented a one-to-many or many-to-many sequence prediction model when in fact you only have a single vector output for one time step (e.g. seq2vec not seq2seq).

Note: framing timesteps as features in sequence prediction problems is a valid strategy, and could lead to improved performance even when using recurrent neural networks (try it!). The important point here is to understand the common pitfalls and not trick yourself when framing your own prediction problems.

## Further Reading

This section provides more resources on the topic if you are looking go deeper.

## Summary

In this tutorial, you discovered the standard models for sequence prediction with recurrent neural networks.

Specifically, you learned:

- How sequence prediction problems are modeled with recurrent neural networks.
- The 4 standard sequence prediction models used by recurrent neural networks.
- The 2 most common misunderstandings made by beginners when applying sequence prediction models.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.