Not all classification predictive models support multiclass classification.
Algorithms such as the Perceptron, Logistic Regression, and Support Vector Machines were designed for binary classification and do not natively support classification tasks with more than two classes.
One approach for using binary classification algorithms for multiclassification problems is to split the multiclass classification dataset into multiple binary classification datasets and fit a binary classification model on each. Two different examples of this approach are the OnevsRest and OnevsOne strategies.
In this tutorial, you will discover OnevsRest and OnevsOne strategies for multiclass classification.
After completing this tutorial, you will know:
 Binary classification models like logistic regression and SVM do not support multiclass classification natively and require metastrategies.
 The OnevsRest strategy splits a multiclass classification into one binary classification problem per class.
 The OnevsOne strategy splits a multiclass classification into one binary classification problem per each pair of classes.
Let’s get started.
Tutorial Overview
This tutorial is divided into three parts; they are:
 Binary Classifiers for MultiClass Classification
 OneVsRest for MultiClass Classification
 OneVsOne for MultiClass Classification
Binary Classifiers for MultiClass Classification
Classification is a predictive modeling problem that involves assigning a class label to an example.
Binary classification are those tasks where examples are assigned exactly one of two classes. Multiclass classification is those tasks where examples are assigned exactly one of more than two classes.
 Binary Classification: Classification tasks with two classes.
 Multiclass Classification: Classification tasks with more than two classes.
Some algorithms are designed for binary classification problems. Examples include:
 Logistic Regression
 Perceptron
 Support Vector Machines
As such, they cannot be used for multiclass classification tasks, at least not directly.
Instead, heuristic methods can be used to split a multiclass classification problem into multiple binary classification datasets and train a binary classification model each.
Two examples of these heuristic methods include:
 OnevsRest (OvR)
 OnevsOne (OvO)
Let’s take a closer look at each.
OneVsRest for MultiClass Classification
Onevsrest (OvR for short, also referred to as OnevsAll or OvA) is a heuristic method for using binary classification algorithms for multiclass classification.
It involves splitting the multiclass dataset into multiple binary classification problems. A binary classifier is then trained on each binary classification problem and predictions are made using the model that is the most confident.
For example, given a multiclass classification problem with examples for each class ‘red,’ ‘blue,’ and ‘green‘. This could be divided into three binary classification datasets as follows:
 Binary Classification Problem 1: red vs [blue, green]
 Binary Classification Problem 2: blue vs [red, green]
 Binary Classification Problem 3: green vs [red, blue]
A possible downside of this approach is that it requires one model to be created for each class. For example, three classes requires three models. This could be an issue for large datasets (e.g. millions of rows), slow models (e.g. neural networks), or very large numbers of classes (e.g. hundreds of classes).
The obvious approach is to use a oneversustherest approach (also called onevsall), in which we train C binary classifiers, fc(x), where the data from class c is treated as positive, and the data from all the other classes is treated as negative.
— Page 503, Machine Learning: A Probabilistic Perspective, 2012.
This approach requires that each model predicts a class membership probability or a probabilitylike score. The argmax of these scores (class index with the largest score) is then used to predict a class.
This approach is commonly used for algorithms that naturally predict numerical class membership probability or score, such as:
 Logistic Regression
 Perceptron
As such, the implementation of these algorithms in the scikitlearn library implements the OvR strategy by default when using these algorithms for multiclass classification.
We can demonstrate this with an example on a 3class classification problem using the LogisticRegression algorithm. The strategy for handling multiclass classification can be set via the “multi_class” argument and can be set to “ovr” for the onevsrest strategy.
The complete example of fitting a logistic regression model for multiclass classification using the builtin onevsrest strategy is listed below.

# logistic regression for multiclass classification using builtin onevsrest from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1) # define model model = LogisticRegression(multi_class=‘ovr’) # fit model model.fit(X, y) # make predictions yhat = model.predict(X) 
The scikitlearn library also provides a separate OneVsRestClassifier class that allows the onevsrest strategy to be used with any classifier.
This class can be used to use a binary classifier like Logistic Regression or Perceptron for multiclass classification, or even other classifiers that natively support multiclass classification.
It is very easy to use and requires that a classifier that is to be used for binary classification be provided to the OneVsRestClassifier as an argument.
The example below demonstrates how to use the OneVsRestClassifier class with a LogisticRegression class used as the binary classification model.

# logistic regression for multiclass classification using a onevsrest from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.multiclass import OneVsRestClassifier # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1) # define model model = LogisticRegression() # define the ovr strategy ovr = OneVsRestClassifier(model) # fit model ovr.fit(X, y) # make predictions yhat = ovr.predict(X) 
OneVsOne for MultiClass Classification
OnevsOne (OvO for short) is another heuristic method for using binary classification algorithms for multiclass classification.
Like onevsrest, onevsone splits a multiclass classification dataset into binary classification problems. Unlike onevsrest that splits it into one binary dataset for each class, the onevsone approach splits the dataset into one dataset for each class versus every other class.
For example, consider a multiclass classification problem with four classes: ‘red,’ ‘blue,’ and ‘green,’ ‘yellow.’ This could be divided into six binary classification datasets as follows:
 Binary Classification Problem 1: red vs. blue
 Binary Classification Problem 2: red vs. green
 Binary Classification Problem 3: red vs. yellow
 Binary Classification Problem 4: blue vs. green
 Binary Classification Problem 5: blue vs. yellow
 Binary Classification Problem 6: green vs. yellow
This is significantly more datasets, and in turn, models than the onevsrest strategy described in the previous section.
The formula for calculating the number of binary datasets, and in turn, models, is as follows:
 (NumClasses * (NumClasses – 1)) / 2
We can see that for four classes, this gives us the expected value of six binary classification problems:
 (NumClasses * (NumClasses – 1)) / 2
 (4 * (4 – 1)) / 2
 (4 * 3) / 2
 12 / 2
 6
Each binary classification model may predict one class label and the model with the most predictions or votes is predicted by the onevsone strategy.
An alternative is to introduce K(K − 1)/2 binary discriminant functions, one for every possible pair of classes. This is known as a oneversusone classifier. Each point is then classified according to a majority vote amongst the discriminant functions.
— Page 183, Pattern Recognition and Machine Learning, 2006.
Similarly, if the binary classification models predict a numerical class membership, such as a probability, then the argmax of the sum of the scores (class with the largest sum score) is predicted as the class label.
Classically, this approach is suggested for support vector machines (SVM) and related kernelbased algorithms. This is believed because the performance of kernel methods does not scale in proportion to the size of the training dataset and using subsets of the training data may counter this effect.
The support vector machine implementation in the scikitlearn is provided by the SVC class and supports the onevsone method for multiclass classification problems. This can be achieved by setting the “decision_function_shape” argument to ‘ovo‘.
The example below demonstrates SVM for multiclass classification using the onevsone method.

# SVM for multiclass classification using builtin onevsone from sklearn.datasets import make_classification from sklearn.svm import SVC # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1) # define model model = SVC(decision_function_shape=‘ovo’) # fit model model.fit(X, y) # make predictions yhat = model.predict(X) 
The scikitlearn library also provides a separate OneVsOneClassifier class that allows the onevsone strategy to be used with any classifier.
This class can be used with a binary classifier like SVM, Logistic Regression or Perceptron for multiclass classification, or even other classifiers that natively support multiclass classification.
It is very easy to use and requires that a classifier that is to be used for binary classification be provided to the OneVsOneClassifier as an argument.
The example below demonstrates how to use the OneVsOneClassifier class with an SVC class used as the binary classification model.

# SVM for multiclass classification using onevsone from sklearn.datasets import make_classification from sklearn.svm import SVC from sklearn.multiclass import OneVsOneClassifier # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1) # define model model = SVC() # define ovo strategy ovo = OneVsOneClassifier(model) # fit model ovo.fit(X, y) # make predictions yhat = ovo.predict(X) 
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Books
APIs
Articles
Summary
In this tutorial, you discovered OnevsRest and OnevsOne strategies for multiclass classification.
Specifically, you learned:
 Binary classification models like logistic regression and SVM do not support multiclass classification natively and require metastrategies.
 The OnevsRest strategy splits a multiclass classification into one binary classification problem per class.
 The OnevsOne strategy splits a multiclass classification into one binary classification problem per each pair of classes.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.