Basic planing

source ]]>

n”,”protected”:false},”excerpt”:”rendered”:”

]]>Adthena is expanding its partnership with market research company Kantar by acquiring Kantar’s paid search business. The two companies announced an agreement earlier this month, through which Adthena data will be integrated into Kantar’s ad intelligence product. Now, through this new acquisition, Adthena said Kantar search clients will get access to the Adthena product suite. […]n”,”protected”:false,”author”:31035538,”featured_media”:1883122,”comment_status”:”open”,”ping_status”:”closed”,”sticky”:false,”template”:””,”format”:”standard”,”meta”:”outcome”:””,”status”:””,”crunchbase_tag”:0,”amp_status”:””,”relegenceEntities”:[],”relegenceSubjects”:[],”carmot_uuid”:”af927823-fccc-3666-939c-60efcc11e712″,”categories”:[449557068],”tags”:[576599285,5282015],”crunchbase_tag”:[576599284],”tc_stories_tax”:[],”tc_ec_category”:[],”tc_event”:[],”jetpack_featured_media_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg”,”shortlink”:”https://tcrn.ch/38ViPix”,”rapidData”:”pt”:””,”pct”:””,”premiumContent”:false,”premiumCutoffPercent”:1,”featured”:false,”subtitle”:””,”seoTitle”:””,”seoDescription”:””,”tc_cb_mapping”:[“slug”:”kantar”,”cb_name”:”Kantar”,”cb_slug”:”kantar-organization”,”cb_link”:”https://crunchbase.com/organization/kantar”,”slug”:”adthena”,”cb_name”:”Adthena”,”cb_slug”:”traffic-smart-adthena-organization”,”cb_link”:”https://crunchbase.com/organization/traffic-smart-adthena”],”associatedEvent”:null,”event”:null,”authors”:[31035538],”hideFeaturedImage”:false,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/wp/v2/posts/2098326″],”collection”:[“href”:”https://techcrunch.com/wp-json/wp/v2/posts”],”about”:[“href”:”https://techcrunch.com/wp-json/wp/v2/types/post”],”replies”:[“embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/comments?post=2098326″],”version-history”:[“count”:4,”href”:”https://techcrunch.com/wp-json/wp/v2/posts/2098326/revisions”],”predecessor-version”:[“id”:2098353,”href”:”https://techcrunch.com/wp-json/wp/v2/posts/2098326/revisions/2098353″],”authors”:[“embeddable”:true,”href”:”https://techcrunch.com/wp-json/tc/v1/users/31035538″],”https://techcrunch.com/edit”:[“href”:”https://techcrunch.com/wp-admin/post.php?post=2098326&action=edit”],”author”:[“embeddable”:true,”href”:”https://techcrunch.com/wp-json/tc/v1/users/31035538″],”wp:featuredmedia”:[“embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/media/1883122″],”wp:attachment”:[“href”:”https://techcrunch.com/wp-json/wp/v2/media?parent=2098326″],”wp:term”:[“taxonomy”:”category”,”embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/categories?post=2098326″,”taxonomy”:”post_tag”,”embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/tags?post=2098326″,”taxonomy”:”_tc_cb_tag_taxonomy”,”embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/crunchbase_tag?post=2098326″,”taxonomy”:”tc_stories_tax”,”embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/tc_stories_tax?post=2098326″,”taxonomy”:”tc_ec_category”,”embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/tc_ec_category?post=2098326″,”taxonomy”:”tc_event”,”embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/tc_event?post=2098326″],”curies”:[“name”:”wp”,”href”:”https://api.w.org/rel”,”templated”:true],”_embedded”:{“authors”:[“id”:31035538,”name”:”Anthony Ha”,”url”:””,”description”:””,”link”:”https://techcrunch.com/author/anthony-ha/”,”slug”:”anthony-ha”,”avatar_urls”:”24″:”https://secure.gravatar.com/avatar/72b358648f4587cc2a11895c32b8e979?s=24&d=identicon&r=g”,”48″:”https://secure.gravatar.com/avatar/72b358648f4587cc2a11895c32b8e979?s=48&d=identicon&r=g”,”96″:”https://secure.gravatar.com/avatar/72b358648f4587cc2a11895c32b8e979?s=96&d=identicon&r=g”,”links”:”facebook”:”https://www.facebook.com/anthonyha”,”twitter”:”https://twitter.com/anthonyha”,”linkedin”:”http://www.linkedin.com/in/anthonyha”,”crunchbase”:”https://www.crunchbase.com/person/anthony-ha”,”position”:”Senior Writer”,”cbDescription”:”

Anthony Ha is a senior writer at TechCrunch, where he covers media and advertising, writes the Daily Crunch newsletter, and co-hosts the Original Content podcast. Previously, he worked as a tech writer at Adweek, a senior editor at the tech blog VentureBeat, and a local government reporter at the Hollister Free Lance. He lives in New York City.”,”cbAvatar”:”https://crunchbase-production-res.cloudinary.com/image/upload/vtobb68s1b8yujb2lsfk”,”twitter”:”anthonyha”,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/tc/v1/users/31035538″],”collection”:[“href”:”https://techcrunch.com/wp-json/tc/v1/users”]],”author”:[“id”:31035538,”name”:”Anthony Ha”,”url”:””,”description”:””,”link”:”https://techcrunch.com/author/anthony-ha/”,”slug”:”anthony-ha”,”avatar_urls”:”24″:”https://secure.gravatar.com/avatar/72b358648f4587cc2a11895c32b8e979?s=24&d=identicon&r=g”,”48″:”https://secure.gravatar.com/avatar/72b358648f4587cc2a11895c32b8e979?s=48&d=identicon&r=g”,”96″:”https://secure.gravatar.com/avatar/72b358648f4587cc2a11895c32b8e979?s=96&d=identicon&r=g”,”links”:”facebook”:”https://www.facebook.com/anthonyha”,”twitter”:”https://twitter.com/anthonyha”,”linkedin”:”http://www.linkedin.com/in/anthonyha”,”crunchbase”:”https://www.crunchbase.com/person/anthony-ha”,”position”:”Senior Writer”,”cbDescription”:”

Anthony Ha is a senior writer at TechCrunch, where he covers media and advertising, writes the Daily Crunch newsletter, and co-hosts the Original Content podcast. Previously, he worked as a tech writer at Adweek, a senior editor at the tech blog VentureBeat, and a local government reporter at the Hollister Free Lance. He lives in New York City.”,”cbAvatar”:”https://crunchbase-production-res.cloudinary.com/image/upload/vtobb68s1b8yujb2lsfk”,”twitter”:”anthonyha”,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/tc/v1/users/31035538″],”collection”:[“href”:”https://techcrunch.com/wp-json/tc/v1/users”]],”wp:featuredmedia”:[“id”:1883122,”date”:”2019-09-18T06:10:31″,”slug”:”search-engine-research-and-debugging”,”type”:”attachment”,”link”:”https://techcrunch.com/2019/09/18/salesforce-brings-ai-power-to-its-search-tool/search-engine-research-and-debugging/”,”title”:”rendered”:”Search engine research and debugging”,”author”:521068,”license”:”source_key”:”getty images”,”person”:”sesame”,”authors”:[521068],”caption”:”rendered”:”

Search engine research and debuggingn”,”alt_text”:”GettyImages 1140687105″,”media_type”:”image”,”mime_type”:”image/jpeg”,”media_details”:”width”:2059,”height”:1456,”file”:”2019/09/GettyImages-1140687105.jpg”,”sizes”:”thumbnail”:”file”:”GettyImages-1140687105.jpg?resize=150,106″,”width”:150,”height”:106,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=150″,”medium”:”file”:”GettyImages-1140687105.jpg?resize=300,212″,”width”:300,”height”:212,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=300″,”medium_large”:”file”:”GettyImages-1140687105.jpg?resize=768,543″,”width”:768,”height”:543,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=1024″,”large”:”file”:”GettyImages-1140687105.jpg?resize=680,481″,”width”:680,”height”:481,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=680″,”1536×1536″:”file”:”GettyImages-1140687105.jpg?resize=1536,1086″,”width”:1536,”height”:1086,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=1536″,”2048×2048″:”file”:”GettyImages-1140687105.jpg?resize=2048,1448″,”width”:2048,”height”:1448,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=2048″,”guest-author-32″:”file”:”GettyImages-1140687105.jpg?resize=32,32″,”width”:32,”height”:32,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=32&h=32&crop=1″,”guest-author-50″:”file”:”GettyImages-1140687105.jpg?resize=50,50″,”width”:50,”height”:50,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=50&h=50&crop=1″,”guest-author-64″:”file”:”GettyImages-1140687105.jpg?resize=64,64″,”width”:64,”height”:64,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=64&h=64&crop=1″,”guest-author-96″:”file”:”GettyImages-1140687105.jpg?resize=96,96″,”width”:96,”height”:96,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=96&h=96&crop=1″,”guest-author-128″:”file”:”GettyImages-1140687105.jpg?resize=128,128″,”width”:128,”height”:128,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=128&h=128&crop=1″,”concierge-thumb”:”file”:”GettyImages-1140687105.jpg?resize=50,35″,”width”:50,”height”:35,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg?w=50″,”full”:”file”:”GettyImages-1140687105.jpg”,”width”:1024,”height”:724,”mime_type”:”image/jpeg”,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg”,”image_meta”:”aperture”:”0″,”credit”:”Getty Images”,”camera”:””,”caption”:”Search engine research and debugging”,”created_timestamp”:”0″,”copyright”:””,”focal_length”:”0″,”iso”:”0″,”shutter_speed”:”0″,”title”:”Search engine research and debugging”,”orientation”:”0″,”keywords”:[],”filesize”:337072,”source_url”:”https://techcrunch.com/wp-content/uploads/2019/09/GettyImages-1140687105.jpg”,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/wp/v2/media/1883122″],”collection”:[“href”:”https://techcrunch.com/wp-json/wp/v2/media”],”about”:[“href”:”https://techcrunch.com/wp-json/wp/v2/types/attachment”],”replies”:[“embeddable”:true,”href”:”https://techcrunch.com/wp-json/wp/v2/comments?post=1883122″],”author”:[“embeddable”:true,”href”:”https://techcrunch.com/wp-json/tc/v1/users/521068″]],”wp:term”:[[“id”:449557068,”link”:”https://techcrunch.com/advertising-tech/”,”name”:”Advertising Tech”,”slug”:”advertising-tech”,”taxonomy”:”category”,”parent”:0,”rapidData”:”pt”:””,”pct”:””,”submenu_categories”:[],”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/wp/v2/categories/449557068″],”collection”:[“href”:”https://techcrunch.com/wp-json/wp/v2/categories”],”about”:[“href”:”https://techcrunch.com/wp-json/wp/v2/taxonomies/category”],”wp:post_type”:[“href”:”https://techcrunch.com/wp-json/wp/v2/posts?categories=449557068″,”href”:”https://techcrunch.com/wp-json/wp/v2/tc-media-gallery?categories=449557068″,”href”:”https://techcrunch.com/wp-json/wp/v2/tc_video?categories=449557068″],”curies”:[“name”:”wp”,”href”:”https://api.w.org/rel”,”templated”:true]],[“id”:576599285,”link”:”https://techcrunch.com/tag/adthena/”,”name”:”Adthena”,”slug”:”adthena”,”taxonomy”:”post_tag”,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/wp/v2/tags/576599285″],”collection”:[“href”:”https://techcrunch.com/wp-json/wp/v2/tags”],”about”:[“href”:”https://techcrunch.com/wp-json/wp/v2/taxonomies/post_tag”],”wp:post_type”:[“href”:”https://techcrunch.com/wp-json/wp/v2/posts?tags=576599285″,”href”:”https://techcrunch.com/wp-json/wp/v2/battlefield-companies?tags=576599285″,”href”:”https://techcrunch.com/wp-json/wp/v2/tc-media-gallery?tags=576599285″,”href”:”https://techcrunch.com/wp-json/wp/v2/tc_video?tags=576599285″],”curies”:[“name”:”wp”,”href”:”https://api.w.org/rel”,”templated”:true],”id”:5282015,”link”:”https://techcrunch.com/tag/kantar/”,”name”:”kantar”,”slug”:”kantar”,”taxonomy”:”post_tag”,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/wp/v2/tags/5282015″],”collection”:[“href”:”https://techcrunch.com/wp-json/wp/v2/tags”],”about”:[“href”:”https://techcrunch.com/wp-json/wp/v2/taxonomies/post_tag”],”wp:post_type”:[“href”:”https://techcrunch.com/wp-json/wp/v2/posts?tags=5282015″,”href”:”https://techcrunch.com/wp-json/wp/v2/battlefield-companies?tags=5282015″,”href”:”https://techcrunch.com/wp-json/wp/v2/tc-media-gallery?tags=5282015″,”href”:”https://techcrunch.com/wp-json/wp/v2/tc_video?tags=5282015″],”curies”:[“name”:”wp”,”href”:”https://api.w.org/rel”,”templated”:true]],[“id”:576599284,”link”:”https://techcrunch.com/?taxonomy=_tc_cb_tag_taxonomy&term=traffic-smart-adthena-organization”,”name”:”traffic-smart-adthena-organization”,”slug”:”traffic-smart-adthena-organization”,”taxonomy”:”_tc_cb_tag_taxonomy”,”_links”:”self”:[“href”:”https://techcrunch.com/wp-json/wp/v2/crunchbase_tag/576599284″],”collection”:[“href”:”https://techcrunch.com/wp-json/wp/v2/crunchbase_tag”],”about”:[“href”:”https://techcrunch.com/wp-json/wp/v2/taxonomies/_tc_cb_tag_taxonomy”],”wp:post_type”:[“href”:”https://techcrunch.com/wp-json/wp/v2/posts?crunchbase_tag=576599284″],”curies”:[“name”:”wp”,”href”:”https://api.w.org/rel”,”templated”:true]],[],[],[]]}}],”media”:[],”events”:[],”battlefieldEvents”:[],”battlefieldCompanies”:[],”battlefieldPages”:[]},”current_posts”:[2098326],”request”:”/2021/01/19/adthena-acquires-kantars-paid-search-business/”,”siteURI”:”https://techcrunch.com/”,”totalPages”:”0″,”videoPlayerIds”:”livestream”:”5f21e3063092ab5338489c39″,”regular”:”56faf851e4b0d3dcac2e081a”,”facebookPixelId”:”1447508128842484″,”marketoAccountId”:”270-WRY-762″,”vidibleCompanyId”:”564f313b67b6231408bc51ee”,”recaptchaPublic”:”6LeZyjwUAAAAABqkWH_Ct0efGn0B4pGU6ZLUeUvA”,”googleAnalyticsID”:”UA-991406-1″,”googleAnalyticsDomains”:[“techcrunch.com”],”googleMapsAPIKey”:”AIzaSyCodzMYMBdZIpxThSQqm79ACyheeRXPPE4″,”googleGeocodingAPIKey”:”AIzaSyB1Q-8UTT79S_4nJZ96oYyiDdpShPh4hfg”,”nps_survey_id”:””,”spotIMEnvId”:”sp_It0mQWOO”,”yahooSearchAppId”:”www.techcrunch.com”,”yahooSearchURL”:”api_url”:”//search.techcrunch.com”,”api_base”:”/sugg/gossip/gossip-us-techcrunch/”,”tinypass”:”apiKey”:”Fy7FpgyUxA”,”apiURL”:”https://api.tinypass.com”,”groupResourceID”:”BR95GYCP”,”scriptDomain”:”https://dashboard.tinypass.com”,”scriptURL”:”//cdn.tinypass.com/api/tinypass.min.js”,”legacyPages”:”extra-crunch-membership”:1781464,”got-a-tip”:899671,”sponsored”:1796357,”the-techcrunch-list”:1992257,”apiNonce”:”d5d1a24e2a”,”userCan”:”editPosts”:false,”restNonce”:null,”isLoginLocked”:””,”initialStore”:”events”:”eventTypeIDs”:[],”eventPostIds”:[],”featuredEventIDs”:”event_home”:[],”featuredPostIDs”:,”pastEventIDs”:”default”:[],”pastFilters”:,”pastLoading”:false,”upcomingEventIDs”:”default”:null,”upcomingFilters”:,”upcomingLoading”:false,”terms”:[],”videoIdsByPlaylist”:”playlists”:[],”section”:”allPosts”:[2098326],”contentObject”:null,”currentPage”:1,”expandedPost”:”https://techcrunch.com/2021/01/19/adthena-acquires-kantars-paid-search-business/”,”expandedPostIds”:[2098326],”expandedIsland”:””,”loading”:false,”component”:”singlePost”,”headlessSubpageSlugs”:[“tickets”,”exhibitor-directory”],”extraCrunchMarketingPageURL”:”/subscribe”,”brandStudioMarketingPageURL”:”/brand-studio”,”newsletterURL”:”https://link.techcrunch.com/join/134/signup-all-newsletters”};

Regression refers to predictive modeling problems that involve predicting a numeric value.

It is different from classification that involves predicting a class label. Unlike classification, you cannot use classification accuracy to evaluate the predictions made by a regression model.

Instead, you must use error metrics specifically designed for evaluating predictions made on regression problems.

In this tutorial, you will discover how to calculate **error metrics for regression** predictive modeling projects.

After completing this tutorial, you will know:

- Regression predictive modeling are those problems that involve predicting a numeric value.
- Metrics for regression involve calculating an error score to summarize the predictive skill of a model.
- How to calculate and report mean squared error, root mean squared error, and mean absolute error.

Let’s get started.

This tutorial is divided into three parts; they are:

- Regression Predictive Modeling
- Evaluating Regression Models
- Metrics for Regression
- Mean Squared Error
- Root Mean Squared Error
- Mean Absolute Error

Predictive modeling is the problem of developing a model using historical data to make a prediction on new data where we do not have the answer.

Predictive modeling can be described as the mathematical problem of approximating a mapping function (f) from input variables (X) to output variables (y). This is called the problem of function approximation.

The job of the modeling algorithm is to find the best mapping function we can given the time and resources available.

For more on approximating functions in applied machine learning, see the post:

Regression predictive modeling is the task of approximating a mapping function (*f*) from input variables (*X*) to a continuous output variable (*y*).

Regression is different from classification, which involves predicting a category or class label.

For more on the difference between classification and regression, see the tutorial:

A continuous output variable is a real-value, such as an integer or floating point value. These are often quantities, such as amounts and sizes.

For example, a house may be predicted to sell for a specific dollar value, perhaps in the range of $100,000 to $200,000.

- A regression problem requires the prediction of a quantity.
- A regression can have real-valued or discrete input variables.
- A problem with multiple input variables is often called a multivariate regression problem.
- A regression problem where input variables are ordered by time is called a time series forecasting problem.

Now that we are familiar with regression predictive modeling, let’s look at how we might evaluate a regression model.

A common question by beginners to regression predictive modeling projects is:

How do I calculate accuracy for my regression model?

Accuracy (e.g. classification accuracy) is a measure for classification, not regression.

**We cannot calculate accuracy for a regression model**.

The skill or performance of a regression model must be reported as an error in those predictions.

This makes sense if you think about it. If you are predicting a numeric value like a height or a dollar amount, you don’t want to know if the model predicted the value exactly (this might be intractably difficult in practice); instead, we want to know how close the predictions were to the expected values.

Error addresses exactly this and summarizes on average how close predictions were to their expected values.

There are three error metrics that are commonly used for evaluating and reporting the performance of a regression model; they are:

- Mean Squared Error (MSE).
- Root Mean Squared Error (RMSE).
- Mean Absolute Error (MAE)

There are many other metrics for regression, although these are the most commonly used. You can see the full list of regression metrics supported by the scikit-learn Python machine learning library here:

In the next section, let’s take a closer look at each in turn.

In this section, we will take a closer look at the popular metrics for regression models and how to calculate them for your predictive modeling project.

Mean Squared Error, or MSE for short, is a popular error metric for regression problems.

It is also an important loss function for algorithms fit or optimized using the least squares framing of a regression problem. Here “*least squares*” refers to minimizing the mean squared error between predictions and expected values.

The MSE is calculated as the mean or average of the squared differences between predicted and expected target values in a dataset.

- MSE = 1 / N * sum for i to N (y_i – yhat_i)^2

Where *y_i* is the i’th expected value in the dataset and *yhat_i* is the i’th predicted value. The difference between these two values is squared, which has the effect of removing the sign, resulting in a positive error value.

The squaring also has the effect of inflating or magnifying large errors. That is, the larger the difference between the predicted and expected values, the larger the resulting squared positive error. This has the effect of “*punishing*” models more for larger errors when MSE is used as a loss function. It also has the effect of “*punishing*” models by inflating the average error score when used as a metric.

We can create a plot to get a feeling for how the change in prediction error impacts the squared error.

The example below gives a small contrived dataset of all 1.0 values and predictions that range from perfect (1.0) to wrong (0.0) by 0.1 increments. The squared error between each prediction and expected value is calculated and plotted to show the quadratic increase in squared error.

... # calculate error err = (expected[i] – predicted[i])**2 |

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# example of increase in mean squared error from matplotlib import pyplot from sklearn.metrics import mean_squared_error # real value expected = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] # predicted value predicted = [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0] # calculate errors errors = list() for i in range(len(expected)): # calculate error err = (expected[i] – predicted[i])**2 # store error errors.append(err) # report error print(‘>%.1f, %.1f = %.3f’ % (expected[i], predicted[i], err)) # plot errors pyplot.plot(errors) pyplot.xticks(ticks=[i for i in range(len(errors))], labels=predicted) pyplot.xlabel(‘Predicted Value’) pyplot.ylabel(‘Mean Squared Error’) pyplot.show() |

Running the example first reports the expected value, predicted value, and squared error for each case.

We can see that the error rises quickly, faster than linear (a straight line).

>1.0, 1.0 = 0.000 >1.0, 0.9 = 0.010 >1.0, 0.8 = 0.040 >1.0, 0.7 = 0.090 >1.0, 0.6 = 0.160 >1.0, 0.5 = 0.250 >1.0, 0.4 = 0.360 >1.0, 0.3 = 0.490 >1.0, 0.2 = 0.640 >1.0, 0.1 = 0.810 >1.0, 0.0 = 1.000 |

A line plot is created showing the curved or super-linear increase in the squared error value as the difference between the expected and predicted value is increased.

The curve is not a straight line as we might naively assume for an error metric.

The individual error terms are averaged so that we can report the performance of a model with regard to how much error the model makes generally when making predictions, rather than specifically for a given example.

The units of the MSE are squared units.

For example, if your target value represents “*dollars*,” then the MSE will be “*squared dollars*.” This can be confusing for stakeholders; therefore, when reporting results, often the root mean squared error is used instead (*discussed in the next section*).

The mean squared error between your expected and predicted values can be calculated using the mean_squared_error() function from the scikit-learn library.

The function takes a one-dimensional array or list of expected values and predicted values and returns the mean squared error value.

... # calculate errors errors = mean_squared_error(expected, predicted) |

The example below gives an example of calculating the mean squared error between a list of contrived expected and predicted values.

# example of calculate the mean squared error from sklearn.metrics import mean_squared_error # real value expected = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] # predicted value predicted = [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0] # calculate errors errors = mean_squared_error(expected, predicted) # report error print(errors) |

Running the example calculates and prints the mean squared error.

A perfect mean squared error value is 0.0, which means that all predictions matched the expected values exactly.

This is almost never the case, and if it happens, it suggests your predictive modeling problem is trivial.

A good MSE is relative to your specific dataset.

It is a good idea to first establish a baseline MSE for your dataset using a naive predictive model, such as predicting the mean target value from the training dataset. A model that achieves an MSE better than the MSE for the naive model has skill.

The Root Mean Squared Error, or RMSE, is an extension of the mean squared error.

Importantly, the square root of the error is calculated, which means that the units of the RMSE are the same as the original units of the target value that is being predicted.

For example, if your target variable has the units “*dollars*,” then the RMSE error score will also have the unit “*dollars*” and not “*squared dollars*” like the MSE.

As such, it may be common to use MSE loss to train a regression predictive model, and to use RMSE to evaluate and report its performance.

The RMSE can be calculated as follows:

- RMSE = sqrt(1 / N * sum for i to N (y_i – yhat_i)^2)

Where *y_i* is the i’th expected value in the dataset, *yhat_i* is the i’th predicted value, and *sqrt()* is the square root function.

We can restate the RMSE in terms of the MSE as:

Note that the RMSE cannot be calculated as the average of the square root of the mean squared error values. This is a common error made by beginners and is an example of Jensen’s inequality.

You may recall that the square root is the inverse of the square operation. MSE uses the square operation to remove the sign of each error value and to punish large errors. The square root reverses this operation, although it ensures that the result remains positive.

The root mean squared error between your expected and predicted values can be calculated using the mean_squared_error() function from the scikit-learn library.

By default, the function calculates the MSE, but we can configure it to calculate the square root of the MSE by setting the “*squared*” argument to *False*.

The function takes a one-dimensional array or list of expected values and predicted values and returns the mean squared error value.

... # calculate errors errors = mean_squared_error(expected, predicted, squared=False) |

The example below gives an example of calculating the root mean squared error between a list of contrived expected and predicted values.

# example of calculate the root mean squared error from sklearn.metrics import mean_squared_error # real value expected = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] # predicted value predicted = [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0] # calculate errors errors = mean_squared_error(expected, predicted, squared=False) # report error print(errors) |

Running the example calculates and prints the root mean squared error.

A perfect RMSE value is 0.0, which means that all predictions matched the expected values exactly.

This is almost never the case, and if it happens, it suggests your predictive modeling problem is trivial.

A good RMSE is relative to your specific dataset.

It is a good idea to first establish a baseline RMSE for your dataset using a naive predictive model, such as predicting the mean target value from the training dataset. A model that achieves an RMSE better than the RMSE for the naive model has skill.

Mean Absolute Error, or MAE, is a popular metric because, like RMSE, the units of the error score match the units of the target value that is being predicted.

Unlike the RMSE, the changes in RMSE are linear and therefore intuitive.

That is, MSE and RMSE punish larger errors more than smaller errors, inflating or magnifying the mean error score. This is due to the square of the error value. The MAE does not give more or less weight to different types of errors and instead the scores increase linearly with increases in error.

As its name suggests, the MAE score is calculated as the average of the absolute error values. Absolute or *abs()* is a mathematical function that simply makes a number positive. Therefore, the difference between an expected and predicted value may be positive or negative and is forced to be positive when calculating the MAE.

The MAE can be calculated as follows:

- MAE = 1 / N * sum for i to N abs(y_i – yhat_i)

Where *y_i* is the i’th expected value in the dataset, *yhat_i* is the i’th predicted value and *abs()* is the absolute function.

We can create a plot to get a feeling for how the change in prediction error impacts the MAE.

The example below gives a small contrived dataset of all 1.0 values and predictions that range from perfect (1.0) to wrong (0.0) by 0.1 increments. The absolute error between each prediction and expected value is calculated and plotted to show the linear increase in error.

... # calculate error err = abs((expected[i] – predicted[i])) |

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# plot of the increase of mean absolute error with prediction error from matplotlib import pyplot from sklearn.metrics import mean_squared_error # real value expected = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] # predicted value predicted = [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0] # calculate errors errors = list() for i in range(len(expected)): # calculate error err = abs((expected[i] – predicted[i])) # store error errors.append(err) # report error print(‘>%.1f, %.1f = %.3f’ % (expected[i], predicted[i], err)) # plot errors pyplot.plot(errors) pyplot.xticks(ticks=[i for i in range(len(errors))], labels=predicted) pyplot.xlabel(‘Predicted Value’) pyplot.ylabel(‘Mean Absolute Error’) pyplot.show() |

Running the example first reports the expected value, predicted value, and absolute error for each case.

We can see that the error rises linearly, which is intuitive and easy to understand.

>1.0, 1.0 = 0.000 >1.0, 0.9 = 0.100 >1.0, 0.8 = 0.200 >1.0, 0.7 = 0.300 >1.0, 0.6 = 0.400 >1.0, 0.5 = 0.500 >1.0, 0.4 = 0.600 >1.0, 0.3 = 0.700 >1.0, 0.2 = 0.800 >1.0, 0.1 = 0.900 >1.0, 0.0 = 1.000 |

A line plot is created showing the straight line or linear increase in the absolute error value as the difference between the expected and predicted value is increased.

The mean absolute error between your expected and predicted values can be calculated using the mean_absolute_error() function from the scikit-learn library.

The function takes a one-dimensional array or list of expected values and predicted values and returns the mean absolute error value.

... # calculate errors errors = mean_absolute_error(expected, predicted) |

The example below gives an example of calculating the mean absolute error between a list of contrived expected and predicted values.

# example of calculate the mean absolute error from sklearn.metrics import mean_absolute_error # real value expected = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] # predicted value predicted = [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0] # calculate errors errors = mean_absolute_error(expected, predicted) # report error print(errors) |

Running the example calculates and prints the mean absolute error.

A perfect mean absolute error value is 0.0, which means that all predictions matched the expected values exactly.

This is almost never the case, and if it happens, it suggests your predictive modeling problem is trivial.

A good MAE is relative to your specific dataset.

It is a good idea to first establish a baseline MAE for your dataset using a naive predictive model, such as predicting the mean target value from the training dataset. A model that achieves a MAE better than the MAE for the naive model has skill.

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered how to calculate error for regression predictive modeling projects.

Specifically, you learned:

- Regression predictive modeling are those problems that involve predicting a numeric value.
- Metrics for regression involve calculating an error score to summarize the predictive skill of a model.
- How to calculate and report mean squared error, root mean squared error, and mean absolute error.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Every month, Netflix adds dozens of new titles to its growing collection of streaming movies and TV series in the U.S. At the same time, it rotates out some of its older titles. Below, we’ve chosen the best movies and TV shows to watch before they’re removed from the streaming service in February.

**Must Watch***Goodfellas* (Feb. 28)

**Good Watch***The Other Guys* (Feb. 11)*Hostiles* (Feb. 14)*Easy A *(Feb. 28)*The Gift* (2015) (Feb. 28)*Haywire* (Feb. 28)*LA 92* (Feb. 28)*Retribution* (2015) (Feb. 28)

**Weepie Watch**

*A Walk to Remember* (Feb. 28)

**Binge Watch***Bates Motel* Seasons 1-5 (Feb. 19)

**Nostalgia Watch***Basic Instinct* (Feb. 28)

**Family Watch***Dolphin Tale 2* (Feb. 24)*Saving Mr. Banks *(Feb. 28)

**Sweet Sounds of Clint Eastwood**** Watch***Gran Torino *(Feb. 28)

**If You’re Bored***Erased* (2012) (Feb. 2)*Lila & Eve* (Feb. 5)*Woody Woodpecker* (Feb. 5)*Don’t Knock Twice *(Feb. 7)*Swiped* (Feb. 7)*A Bad Moms Christmas* (Feb. 10)*Alone in Berlin* (Feb. 14)*Brave Miss World*: Collection 1 (Feb. 16)*A Haunted House* (Feb. 20)*Trespass Against Us* (Feb. 21)*The Frozen Ground* (Feb. 26)*Little Nicky* (Feb. 28)*My Little Pony Equestria Girls: Friendship Games *(Feb. 28)*Sleepover* (2004) (Feb. 28)

Readers like you make our work possible. Help us continue to provide the reporting, commentary, and criticism you won’t find anywhere else.

My first video of a 3 part series on “coordinated inauthentic behavior”. Thanks to ExpressVPN for sponsoring this series. Get 3 months free with 12-month plan at

https://www.expressvpn.com/smarter Click here if you’re interested in subscribing: http://bit.ly/Subscribe2SED

⇊ Click below for more links! ⇊

Test your internet connection for leaks here: https://browserleaks.com/ip)

For third party research check out (https://www.comparitech.com/vpn/vpn-leaks/)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GET SMARTER SECTION

Renée Diresta is a Mozilla Fellow in Media, Misinformation, and Trust, where she researches unintended consequences of algorithms and works towards helping machines make better decisions. Renee also writes about disinformation and the changing face of information war — check out her essay “The Digital Maginot Line” (https://www.ribbonfarm.com/2018/11/28/the-digital-maginot-line) — and is a contributor to Wired Ideas (https://www.wired.com/author/renee-diresta).

Here are examples of some of the videos in question

(Open these videos in incognito mode so you don’t mess up your algorithm.)

https://www.youtube.com/watch?v=3bfkZNh-NOE – 298k views

https://www.youtube.com/watch?v=AvA6fmfazCs 8k views

https://www.youtube.com/watch?v=q72JmckZM8M

To say this presentation style is shady is an understatement. Quite interesting. https://www.youtube.com/watch?v=t-5_XJxJrPA – 107k views

Here’s one of the channels: https://www.youtube.com/channel/UCt0Rc8yD14JXkZUdZvOBgow

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GET STUFF SECTION:

(If I did this right these should be working Amazon affiliate links to purchase the stuff I like to use. When people purchase from these links it will support Smarter Every Day.)

❓Mystery Item (just for fun): https://amzn.to/2zDfR2X

Things I use and like:

Camera I use : https://amzn.to/2VSiruw

Favorite Lens: https://amzn.to/2KPDQ1a

Wide-angle: https://amzn.to/2SlPchR

On-camera Mic: https://amzn.to/2SJulF4

Lav Mic: https://amzn.to/3aRek6r

Hot shoe mount for Lav Receiver: https://amzn.to/35m6uAo

My Tripod: https://amzn.to/2Yl6RtJ

My Multi-tool: https://amzn.to/2zGm5Pz

Favorite SD Card: https://amzn.to/2KQ3Edz

How I get footage off my phone: https://amzn.to/2KMem4K

Travel Tripod: https://amzn.to/2zEa9Oi

My Backpack: https://amzn.to/35jveJL

My Headlamp: https://amzn.to/3deYmVt

Favorite Bidet: https://amzn.to/2xnMG3b

World Map: https://amzn.to/3aTFCZT

Favorite Shoes: https://amzn.to/3f5trfV

Everyone needs a snatchblock: https://amzn.to/2DMR4s8

🥽Goggle Up! : https://amzn.to/2zG754g

Also, if you’re interested in a Smarter Every Day shirt etc. they’re really soft and you can get there here: https://www.smartereveryday.com/store

~~~~~~~~~~~~~~~~~~~~~~~~~~

Tweet Ideas to me at:

Tweets by smartereveryday

Smarter Every Day on Facebook

https://www.facebook.com/SmarterEveryDay

Smarter Every Day on Patreon

http://www.patreon.com/smartereveryday

Smarter Every Day On Instagram

http://www.instagram.com/smartereveryday

Smarter Every Day SubReddit

http://www.reddit.com/r/smartereveryday

Ambiance and musicy things by: Gordon McGladdery did the outro music the video.

http://ashellinthepit.bandcamp.com/

The thought is it my efforts making videos will help educate the world as a whole, and one day generate enough revenue to pay for my kids college education. Until then if you appreciate what you’ve learned in this video and the effort that went in to it, please SHARE THE VIDEO!

If you REALLY liked it, feel free to pitch a few dollars Smarter Every Day by becoming a Patron.

http://www.patreon.com/smartereveryday

Warm Regards,

Destin

source

Volopay, a Singapore-based startup building a “financial control center” for businesses, announced today it has raised $2.1 million in seed funding. The round was led by Tinder co-founder Justin Mateen, and included participation from Soma Capital, CP Ventures, Y Combinator, VentureSouq, the founders of Razorpay and other angel investors.

The funding will be used on hiring, product development, strategic partnerships and Volopay’s international expansion. It plans to launch operations in Australia later this month. The company currently has about 100 clients, including Smart Karma, Dathena, Medline, Sensorflow and Beam.

Launched in 2019 by Rajith Shaiji and Rajesh Raikwar, Volopay took part in Y Combinator’s accelerator program last year. It was created after chief executive officer Shaji, who worked for several fintech companies before launching Volopay, became frustrated by the process of reconciling business expenses, especially with accounting departments located in different countries. Shaiji and Raikwar also saw that many companies, especially startups and SMEs, struggled to track different kinds of spending, including subscriptions and vendor payments.

Most of Volopay’s clients are in the tech sector and have about 15 to 150 employees. Volopay’s platform integrates multi-currency corporate cards (issued by VISA Corporate), domestic and international bank transfers, automated payments and expense and accounting software, allowing companies to save money on foreign exchange fees and reconcile expenses more quickly.

In order to speed up its development, Volopay integrated Airwallex’s APIs. Its corporate cards offer up to 2% cashback on software subscriptions, hosting and international travel, which Volopay says are the three top expense categories for tech companies, and it in November 2020, it launched a credit facility for corporate cards to help give SMEs more liquidity during the COVID-19 pandemic.

Compared to traditional credit products, like credit cards and working capital loans, Shaji said Volopay’s credit facility, which is also issued by VISA Corporate, has a more competitive fixed-free pricing structure that depends on the level of credit used. This means companies know how much they owe in advance, which in turn helps them manage their cashflows more easily. The average credit line provided by Volopay is about $30,000.

Since TechCrunch last covered Volopay in July 2020, it has grown 70% month on month in terms of total funds flowing through its platform, Shaji said. It also launched two new features: a bill pay feature that allows clients to transfer money domestically and internationally with low foreign exchange rates and transaction fees, and the credit facility. The bill pay feature now contributes about 40% to Volopay’s total payment volume, while the credit product makes up 30% of its card spending.

Shaji told TechCrunch that Volopay decided to expand into Australia because because not only is it a much larger market than Singapore, but “SMEs in Australia are very comfortable using paid digital software to streamline internal operations and scale their businesses.” He added that there is currently no other provider in Australia that offers both expense management and credit to SMEs like Volopay.

President Donald Trump’s personal lawyer won’t be part of the defense team in the Senate trial for his second impeachment. Giuliani, who has been leading the president’s failed efforts to prove baseless claims of voter fraud, said he wouldn’t represent Trump because he was involved in the Jan. 6 rally that preceded the riot at the Capitol. “Because I gave an earlier speech [at the rally], I am a witness and therefore unable to participate in court or in the Senate chamber,” Giuliani told ABC News.

Giuliani issued a statement saying he “may be a witness” and that means “the rules of legal ethics would prohibit me from representing the president as trial counsel in the impeachment trial.” At the Jan. 6 rally, Giuliani fired up the crowd by calling for “trial by combat.” He later defended those words by saying they were a reference to the television show *Game of Thrones*.

Watch John Eastman (left) break into a smile when Rudy Giuliani declares: “Let’s have trial by combat!” pic.twitter.com/ekvweCdhsq

— Mark Joseph Stern (@mjs_DC) January 13, 2021

The announcement comes after a bit of back-and-forth over the weekend. Giuliani had said he was working on the president’s defense and said he was ready to argue the president’s case. Giuliani said there were disagreements about how Trump should be defended but said that he wanted to argue the validity of voter fraud allegations. But Trump’s team had other ideas and a spokesman quickly tweeted out a statement saying the president had yet to decide “which lawyer or law firm” would represent him.

Statement On President Trump’s Impeachment Defense Team:

President Trump has not yet made a determination as to which lawyer or law firm will represent him for the disgraceful attack on our Constitution and democracy, known as the “impeachment hoax.” We will keep you informed.

Trump reportedly started telling people around him Sunday that Giuliani would not represent him. That was the same day as veteran Republican strategist Karl Rove said in a television interview that Trump’s chances of conviction would increase if he was defended by Giuliani. It’s still unclear who will actually represent the president considering many attorneys have refused to take his case.

Karl Rove says Rudy Giuliani’s impeachment defense that “the attack on the Capitol and the attempt to end [Congress] certifying the election was justified” because their false election claims are true “raises the likelihood of more than 17 Republicans voting for conviction” pic.twitter.com/hf4JeerOFB

— Justin Baragona (@justinbaragona) January 17, 2021

Virgin Orbit launched its LauncherOne rocket to orbit for the first time today, with a successful demonstration mission that carried a handful of satellites and delivered them successfully to low Earth orbit on behalf of NASA. It’s a crucial milestone for the small satellite launch company, and the first time the company has shown that its hybrid carrier aircraft/small payload orbital delivery rocket works as intended, which should set the company up to begin commercial operations of its launch system very soon.

This is the second attempt at reaching orbit for Virgin Orbit, after a first try in late May ended with the LauncherOne rocket initiating an automatic safety shutdown of its engines shortly after detaching from the ‘Cosmic Girl’ carrier aircraft, a modified Boeing 747 that transports the rocket to its launch altitude. The company said that it learned a lot from that attempt, including identifying the error that caused the failsafe engine shut down, which it corrected in advance of today’s mission.

Virgin’s Cosmic Girl took off at just before 2 PM EDT, and then released LauncherOne from its wing at roughly 2:40 PM EDT. LauncherOne had a “clean separation” as intended, and then ignited its own rocket engines and quickly accelerated to the point where it was undergoing the maximum amount of aerodynamic pressure (called max q in the aerospace industry). LauncherOne’s main engine then cut off after its burn, and its payload stage separated, crossing the Karman line and entering space for the first time.

It achieved orbit at around 2:49 PM EDT, and released its payload of satellites to their target orbit sometime later on schedule, making the mission a complete success.

Virgin Orbit’s unique value proposition in the small launch market is that it can take off and land from traditional runways thanks to its carrier aircraft and mid-air rocket launch approach. That should provide flexibility in terms of launch locations, allowing it to be more responsive to customer needs in terms of geographies and target orbital deliveries.

In 2017, Virgin Orbit was spun out of Virgin Galactic, to focus exclusively on small payload orbital launch. Virgin Galactic then devoted itself entirely to its own mission of offering commercial human spaceflight. Virgin Orbit itself create its own subsidiary earlier this year, called VOX Space, which intends to use LauncherOne to deliver small satellites to orbit specifically for the U.S. national security market.

**Activation functions** are a critical part of the design of a neural network.

The choice of activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.

As such, a careful choice of activation function must be made for each deep learning neural network project.

In this tutorial, you will discover how to choose activation functions for neural network models.

After completing this tutorial, you will know:

- Activation functions are a key part of neural network design.
- The modern default activation function for hidden layers is the ReLU function.
- The activation function for output layers depends on the type of prediction problem.

Let’s get started.

This tutorial is divided into three parts; they are:

- Activation Functions
- Activation for Hidden Layers
- Activation for Output Layers

An activation function in a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network.

Sometimes the activation function is called a “*transfer function*.” If the output range of the activation function is limited, then it may be called a “*squashing function*.” Many activation functions are nonlinear and may be referred to as the “*nonlinearity*” in the layer or the network design.

The choice of activation function has a large impact on the capability and performance of the neural network, and different activation functions may be used in different parts of the model.

Technically, the activation function is used within or after the internal processing of each node in the network, although networks are designed to use the same activation function for all nodes in a layer.

A network may have three types of layers: input layers that take raw input from the domain, **hidden layers** that take input from another layer and pass output to another layer, and **output layers** that make a prediction.

All hidden layers typically use the same activation function. The output layer will typically use a different activation function from the hidden layers and is dependent upon the type of prediction required by the model.

Activation functions are also typically differentiable, meaning the first-order derivative can be calculated for a given input value. This is required given that neural networks are typically trained using the backpropagation of error algorithm that requires the derivative of prediction error in order to update the weights of the model.

There are many different types of activation functions used in neural networks, although perhaps only a small number of functions used in practice for hidden and output layers.

Let’s take a look at the activation functions used for each type of layer in turn.

A hidden layer in a neural network is a layer that receives input from another layer (such as another hidden layer or an input layer) and provides output to another layer (such as another hidden layer or an output layer).

A hidden layer does not directly contact input data or produce outputs for a model, at least in general.

A neural network may have zero or more hidden layers.

Typically, a differentiable nonlinear activation function is used in the hidden layers of a neural network. This allows the model to learn more complex functions than a network trained using a linear activation function.

In order to get access to a much richer hypothesis space that would benefit from deep representations, you need a non-linearity, or activation function.

— Page 72, Deep Learning with Python, 2017.

There are perhaps three activation functions you may want to consider for use in hidden layers; they are:

- Rectified Linear Activation (
**ReLU**) - Logistic (
**Sigmoid**) - Hyperbolic Tangent (
**Tanh**)

This is not an exhaustive list of activation functions used for hidden layers, but they are the most commonly used.

Let’s take a closer look at each in turn.

The rectified linear activation function, or ReLU activation function, is perhaps the most common function used for hidden layers.

It is common because it is both simple to implement and effective at overcoming the limitations of other previously popular activation functions, such as Sigmoid and Tanh. Specifically, it is less susceptible to vanishing gradients that prevent deep models from being trained, although it can suffer from other problems like saturated or “*dead*” units.

The ReLU function is calculated as follows:

This means that if the input value (x) is negative, then a value 0.0 is returned, otherwise, the value is returned.

You can learn more about the details of the ReLU activation function in this tutorial:

We can get an intuition for the shape of this function with the worked example below.

# example plot for the relu activation function from matplotlib import pyplot
# rectified linear function def rectified(x): return max(0.0, x)
# define input data inputs = [x for x in range(–10, 10)] # calculate outputs outputs = [rectified(x) for x in inputs] # plot inputs vs outputs pyplot.plot(inputs, outputs) pyplot.show() |

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see the familiar kink shape of the ReLU activation function.

When using the ReLU function for hidden layers, it is a good practice to use a “*He Normal*” or “*He Uniform*” weight initialization and scale input data to the range 0-1 (normalize) prior to training.

The sigmoid activation function is also called the logistic function.

It is the same function used in the logistic regression classification algorithm.

The function takes any real value as input and outputs values in the range 0 to 1. The larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the output will be to 0.0.

The sigmoid activation function is calculated as follows:

Where e is a mathematical constant, which is the base of the natural logarithm.

We can get an intuition for the shape of this function with the worked example below.

# example plot for the sigmoid activation function from math import exp from matplotlib import pyplot
# sigmoid activation function def sigmoid(x): return 1.0 / (1.0 + exp(–x))
# define input data inputs = [x for x in range(–10, 10)] # calculate outputs outputs = [sigmoid(x) for x in inputs] # plot inputs vs outputs pyplot.plot(inputs, outputs) pyplot.show() |

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see the familiar S-shape of the sigmoid activation function.

When using the Sigmoid function for hidden layers, it is a good practice to use a “*Xavier Normal*” or “*Xavier Uniform*” weight initialization (also referred to Glorot initialization, named for Xavier Glorot) and scale input data to the range 0-1 (e.g. the range of the activation function) prior to training.

The hyperbolic tangent activation function is also referred to simply as the Tanh (also “*tanh*” and “*TanH*“) function.

It is very similar to the sigmoid activation function and even has the same S-shape.

The function takes any real value as input and outputs values in the range -1 to 1. The larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the output will be to -1.0.

The Tanh activation function is calculated as follows:

- (e^x – e^-x) / (e^x + e^-x)

Where e is a mathematical constant that is the base of the natural logarithm.

We can get an intuition for the shape of this function with the worked example below.

# example plot for the tanh activation function from math import exp from matplotlib import pyplot
# tanh activation function def tanh(x): return (exp(x) – exp(–x)) / (exp(x) + exp(–x))
# define input data inputs = [x for x in range(–10, 10)] # calculate outputs outputs = [tanh(x) for x in inputs] # plot inputs vs outputs pyplot.plot(inputs, outputs) pyplot.show() |

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see the familiar S-shape of the Tanh activation function.

When using the TanH function for hidden layers, it is a good practice to use a “*Xavier Normal*” or “*Xavier Uniform*” weight initialization (also referred to Glorot initialization, named for Xavier Glorot) and scale input data to the range -1 to 1 (e.g. the range of the activation function) prior to training.

A neural network will almost always have the same activation function in all hidden layers.

It is most unusual to vary the activation function through a network model.

Traditionally, the sigmoid activation function was the default activation function in the 1990s. Perhaps through the mid to late 1990s to 2010s, the Tanh function was the default activation function for hidden layers.

… the hyperbolic tangent activation function typically performs better than the logistic sigmoid.

— Page 195, Deep Learning, 2016.

Both the sigmoid and Tanh functions can make the model more susceptible to problems during training, via the so-called vanishing gradients problem.

You can learn more about this problem in this tutorial:

The activation function used in hidden layers is typically chosen based on the type of neural network architecture.

Modern neural network models with common architectures, such as MLP and CNN, will make use of the ReLU activation function, or extensions.

In modern neural networks, the default recommendation is to use the rectified linear unit or ReLU …

— Page 174, Deep Learning, 2016.

Recurrent networks still commonly use Tanh or sigmoid activation functions, or even both. For example, the LSTM commonly uses the Sigmoid activation for recurrent connections and the Tanh activation for output.

**Multilayer Perceptron (MLP)**: ReLU activation function.**Convolutional Neural Network (CNN)**: ReLU activation function.**Recurrent Neural Network**: Tanh and/or Sigmoid activation function.

If you’re unsure which activation function to use for your network, try a few and compare the results.

The figure below summarizes how to choose an activation function for the hidden layers of your neural network model.

The output layer is the layer in a neural network model that directly outputs a prediction.

All feed-forward neural network models have an output layer.

There are perhaps three activation functions you may want to consider for use in the output layer; they are:

- Linear
- Logistic (Sigmoid)
- Softmax

This is not an exhaustive list of activation functions used for output layers, but they are the most commonly used.

Let’s take a closer look at each in turn.

The linear activation function is also called “*identity*” (multiplied by 1.0) or “*no activation*.”

This is because the linear activation function does not change the weighted sum of the input in any way and instead returns the value directly.

We can get an intuition for the shape of this function with the worked example below.

# example plot for the linear activation function from matplotlib import pyplot
# linear activation function def linear(x): return x
# define input data inputs = [x for x in range(–10, 10)] # calculate outputs outputs = [linear(x) for x in inputs] # plot inputs vs outputs pyplot.plot(inputs, outputs) pyplot.show() |

We can see a diagonal line shape where inputs are plotted against identical outputs.

Target values used to train a model with a linear activation function in the output layer are typically scaled prior to modeling using normalization or standardization transforms.

The sigmoid of logistic activation function was described in the previous section.

Nevertheless, to add some symmetry, we can review for the shape of this function with the worked example below.

# example plot for the sigmoid activation function from math import exp from matplotlib import pyplot
# sigmoid activation function def sigmoid(x): return 1.0 / (1.0 + exp(–x))
# define input data inputs = [x for x in range(–10, 10)] # calculate outputs outputs = [sigmoid(x) for x in inputs] # plot inputs vs outputs pyplot.plot(inputs, outputs) pyplot.show() |

We can see the familiar S-shape of the sigmoid activation function.

Target labels used to train a model with a sigmoid activation function in the output layer will have the values 0 or 1.

The softmax function outputs a vector of values that sum to 1.0 that can be interpreted as probabilities of class membership.

It is related to the argmax function that outputs a 0 for all options and 1 for the chosen option. Softmax is a “*softer*” version of argmax that allows a probability-like output of a winner-take-all function.

As such, the input to the function is a vector of real values and the output is a vector of the same length with values that sum to 1.0 like probabilities.

The softmax function is calculated as follows:

Where *x* is a vector of outputs and e is a mathematical constant that is the base of the natural logarithm.

You can learn more about the details of the Softmax function in this tutorial:

We cannot plot the softmax function, but we can give an example of calculating it in Python.

# softmax activation function def softmax(x): return exp(x) / exp(x).sum()
# define input data inputs = [1.0, 3.0, 2.0] # calculate outputs outputs = softmax(inputs) # report the probabilities print(outputs) # report the sum of the probabilities print(outputs.sum()) |

Running the example calculates the softmax output for the input vector.

We then confirm that the sum of the outputs of the softmax indeed sums to the value 1.0.

[0.09003057 0.66524096 0.24472847] 1.0 |

Target labels used to train a model with the softmax activation function in the output layer will be vectors with 1 for the target class and 0 for all other classes.

You must choose the activation function for your output layer based on the type of prediction problem that you are solving.

Specifically, the type of variable that is being predicted.

For example, you may divide prediction problems into two main groups, predicting a categorical variable (*classification*) and predicting a numerical variable (*regression*).

If your problem is a regression problem, you should use a linear activation function.

**Regression**: One node, linear activation.

If your problem is a classification problem, then there are three main types of classification problems and each may use a different activation function.

Predicting a probability is not a regression problem; it is classification. In all cases of classification, your model will predict the probability of class membership (e.g. probability that an example belongs to each class) that you can convert to a crisp class label by rounding (for sigmoid) or argmax (for softmax).

If there are two mutually exclusive classes (binary classification), then your output layer will have one node and a sigmoid activation function should be used. If there are more than two mutually exclusive classes (multiclass classification), then your output layer will have one node per class and a softmax activation should be used. If there are two or more mutually inclusive classes (multilabel classification), then your output layer will have one node for each class and a sigmoid activation function is used.

**Binary Classification**: One node, sigmoid activation.**Multiclass Classification**: One node per class, softmax activation.**Multilabel Classification**: One node per class, sigmoid activation.

The figure below summarizes how to choose an activation function for the output layer of your neural network model.

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered how to choose activation functions for neural network models.

Specifically, you learned:

- Activation functions are a key part of neural network design.
- The modern default activation function for hidden layers is the ReLU function.
- The activation function for output layers depends on the type of prediction problem.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.