Guy Millward, chief financing officer at Advanced Computer Software (LON:ASW), says Computer Software Holdings will contribute to growth next year.

source ]]>

Vynn Capital, a new entrant to Southeast Asia’s startup ecosystem, is gearing up to close its maiden fund after it landed an undisclosed sum from Malaysia Venture Capital Management Bhd (MAVCAP) as one of its anchor LPs.

Founded by former Gobi Ventures VC Victor Chua and Singaporean investor Darren Chua (no relation) one year ago, Kuala Lumpur-based Vynn is targeting a $40 million fund for Southeast Asia. The firm has already made four investments and, on the LP side, gone after traditional businesses and Southeast Asia’s family corporations. Landing MAVCAP — which is Malaysia’s largest investor has backed VC funds including Gobi — is a major coup for a debut fund.

“The investment from MAVCAP is a very good validation for Vynn Capital,” said Victor Chua, who is Malaysian. “Personally, having been active in the local and regional ecosystem, I’ve benefited from the growth trajectory of the ecosystem and am now able to launch a new fund that is addressing the need of the traditional businesses to be innovative.”

“The thesis of the fund is Southeast Asia, but through our investment we are focused on how it will be invested in Malaysian deals,” MAVCAP’s Shahril Anas told TechCrunch in an interview. “We have some carry and expect returns that we can invest into local entrepreneurs in Malaysia, we are also keen to look at how other countries’ economies interact with startups.”

Anas said the approach is to be very hands-off, MAVCAP has various other fund investments, but he reiterated that there may be specific data or insight that the organization looks to glean.

Southeast Asia is emerging from the shadows of China and India to become a target market for startups and, by extension, the investors who write the checks to finance them.

Beyond a cumulative population of over 600 million people, the region’s ‘digital economy’ is tipped to grow to $240 billion by 2025 from $31 million in 2015, according to a report from Google and Singapore sovereign fund Temasek.

Some of the other investors vying for a slice of the opportunity include new funds from Openspace Ventures ($135 million), Indonesia-focused Intudo ($50 million), Qualgro ($100 million) and Golden Gate Ventures ($100 million) and Sequoia India ($695 million).

Neural network models are trained using stochastic gradient descent and model weights are updated using the backpropagation algorithm.

The optimization solved by training a neural network model is very challenging and although these algorithms are widely used because they perform so well in practice, there are no guarantees that they will converge to a good model in a timely manner.

The challenge of training neural networks really comes down to the challenge of configuring the training algorithms.

In this post, you will discover tips and tricks for getting the most out of the backpropagation algorithm when training neural network models.

After reading this post, you will know:

- The challenge of training a neural network is really the balance between learning the training dataset and generalizing to new examples beyond the training dataset.
- Eight specific tricks that you can use to train better neural network models, faster.
- Second order optimization algorithms that can also be used to train neural networks under certain circumstances.

Let’s get started.

This tutorial is divided into five parts; they are:

- Efficient BackProp Overview
- Learning and Generalization
- 8 Practical Tricks for Backpropagation
- Second Order Optimization Algorithms
- Discussion and Conclusion

The 1998 book titled “Neural Networks: Tricks of the Trade” provides a collection of chapters by academics and neural network practitioners that describe best practices for configuring and using neural network models.

The book was updated at the cusp of the deep learning renaissance and a second edition was released in 2012 including 13 new chapters.

The first chapter in both editions is titled “*Efficient BackProp*” written by Yann LeCun, Leon Bottou, (both at Facebook AI), Genevieve Orr, and Klaus-Robert Muller (also co-editors of the book).

The chapter is also available online for free as a pre-print.

The chapter was also summarized in a preface in both editions of the book titled “*Speed Learning*.”

It is an important chapter and document as it provides a near-exhaustive summary of how to best configure backpropagation under stochastic gradient descent as of 1998, and much of the advice is just as relevant today.

In this post, we will focus on this chapter or paper and attempt to distill the most relevant advice for modern deep learning practitioners.

For reference, the chapter is divided into 10 sections; they are:

- 1.1: Introduction
- 1.2: Learning and Generalization
- 1.3: Standard Backpropagation
- 1.4: A Few Practical Tricks
- 1.5: Convergence of Gradient Descent
- 1.6: Classical Second Order Optimization Methods
- 1.7: Tricks to Compute the Hessian Information in Multilayer Networks
- 1.8: Analysis of the Hessian in Multi-layer Networks
- 1.9: Applying Second Order Methods to Multilayer Networks
- 1.10: Discussion and Conclusion

We will focus on the tips and tricks for configuring backpropagation and stochastic gradient descent.

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

The chapter begins with a description of the general problem of the dual challenge of learning and generalization with neural network models.

The authors motivate the article by highlighting that the backpropagation algorithm is the most widely used algorithm to train neural network models because it works and because it is efficient.

Backpropagation is a very popular neural network learning algorithm because it is conceptually simple, computationally efficient, and because it often works. However, getting it to work well, and sometimes to work at all, can seem more of an art than a science.

The authors also remind us that training neural networks with backpropagation is really hard. Although the algorithm is both effective and efficient, it requires the careful configuration of multiple model properties and model hyperparameters, each of which requires deep knowledge of the algorithm and experience to set correctly.

And yet, there are no rules to follow to “*best*” configure a model and training process.

Designing and training a network using backprop requires making many seemingly arbitrary choices such as the number and types of nodes, layers, learning rates, training and test sets, and so forth. These choices can be critical, yet there is no foolproof recipe for deciding them because they are largely problem and data dependent.

The goal of training a neural network model is most challenging because it requires solving two hard problems at once:

**Learning**the training dataset in order to best minimize the loss.**Generalizing**the model performance in order to make predictions on unseen examples.

There is a trade-off between these concerns, as a model that learns too well will generalize poorly, and a model that generalizes well may be underfit. The goal of training a neural network well is to find a happy balance between these two concerns.

This chapter is focused on strategies for improving the process of minimizing the cost function. However, these strategies must be used in conjunction with methods for maximizing the network’s ability to generalize, that is, to predict the correct targets for patterns the learning system has not previously seen.

Interestingly, the problem of training a neural network model is cast in terms of the bias-variance trade-off, often used to describe machine learning algorithms in general.

When fitting a neural network model, these terms can be defined as:

**Bias**: A measure of how the network output averaged across all datasets differs from the desired function.**Variance**: A measure of how much the network output varies across datasets.

This framing casts defining the capacity of the model as a choice of bias, controlling the range of functions that can be learned. It casts variance as a function of the training process and the balance struck between overfitting the training dataset and generalization error.

This framing can also help in understanding the dynamics of model performance during training. That is, from a model with large bias and small variance in the beginning of training to a model with lower bias and higher variance at the end of training.

Early in training, the bias is large because the network output is far from the desired function. The variance is very small because the data has had little influence yet. Late in training, the bias is small because the network has learned the underlying function.

These are the normal dynamics of the model, although when training, we must guard against training the model too much and overfitting the training dataset. This makes the model fragile, pushing the bias down, specializing the model to training examples and, in turn, causing much larger variance.

However, if trained too long, the network will also have learned the noise specific to that dataset. This is referred to as overtraining. In such a case, the variance will be large because the noise varies between datasets.

A focus on the backpropagation algorithm means a focus on “*learning*” at the expense of temporally ignoring “*generalization*” that can be addressed later with the introduction of regularization techniques.

A focus on learning means a focus on minimizing loss both quickly (fast learning) and effectively (learning well).

The idea of this chapter, therefore, is to present minimization strategies (given a cost function) and the tricks associated with increasing the speed and quality of the minimization.

The focus of the chapter is a sequence of practical tricks for backpropagation to better train neural network models.

There are eight tricks; they are:

- 1.4.1: Stochastic Versus Batch Learning
- 1.4.2: Shuffling the Examples
- 1.4.3: Normalizing the Inputs
- 1.4.4: The Sigmoid
- 1.4.5: Choosing Target Values
- 1.4.6: Initializing the Weights
- 1.4.7: Choosing Learning Rates
- 1.4.8: Radial Basis Function vs Sigmoid

The section starts off with a comment that the optimization problem that we are trying to solve with stochastic gradient descent and backpropagation is challenging.

Backpropagation can be very slow particularly for multilayered networks where the cost surface is typically non-quadratic, non-convex, and high dimensional with many local minima and/or flat regions.

The authors go on to highlight that in choosing stochastic gradient descent and the backpropagation algorithms to optimize and update weights, we have no grantees of performance.

There is no formula to guarantee that (1) the network will converge to a good solution, (2) convergence is swift, or (3) convergence even occurs at all.

These comments provide the context for the tricks that also make no guarantees but instead increase the likelihood of finding a better model, faster.

Let’s take a closer look at each trick in turn.

Many of the tricks are focused on sigmoid (s-shaped) activation functions, which are no longer best practice for use in hidden layers, having been replaced by the rectified linear activation function. As such, we will spend less time on sigmoid-related tricks.

This tip highlights the choice between using either stochastic or batch gradient descent when training your model.

Stochastic gradient descent, also called online gradient descent, refers to a version of the algorithm where the error gradient is estimated from a single randomly selected example from the training dataset and the model parameters (weights) are then updated.

It has the effect of training the model fast, although it can result in large, noisy updates to model weights.

Stochastic learning is generally the preferred method for basic backpropagation for the following three reasons:

1. Stochastic learning is usually much faster than batch learning.

2. Stochastic learning also often results in better solutions.

3. Stochastic learning can be used for tracking changes.

Batch gradient descent involves estimating the error gradient using the average from all examples in the training dataset. It is faster to execute and is better understood from a theoretical perspective, but results in slower learning.

Despite the advantages of stochastic learning, there are still reasons why one might consider using batch learning:

1. Conditions of convergence are well understood.

2. Many acceleration techniques (e.g. conjugate gradient) only operate in batch learning.

3. Theoretical analysis of the weight dynamics and convergence rates are simpler.

Generally, the authors recommend using stochastic gradient descent where possible because it offers faster training of the model.

Despite the advantages of batch updates, stochastic learning is still often the preferred method particularly when dealing with very large data sets because it is simply much faster.

They suggest making use of a learning rate decay schedule in order to counter the noisy effect of the weight updates seen during stochastic gradient descent.

… noise, which is so critical for finding better local minima also prevents full convergence to the minimum. […] So in order to reduce the fluctuations we can either decrease (anneal) the learning rate or have an adaptive batch size.

They also suggest using mini-batches of samples to reduce the noise of the weight updates. This is where the error gradient is estimated across a small subset of samples from the training dataset instead of one sample in the case of stochastic gradient descent or all samples in the case of batch gradient descent.

This variation later became known as Mini-Batch Gradient Descent and is the default when training neural networks.

Another method to remove noise is to use “mini-batches”, that is, start with a small batch size and increase the size as training proceeds.

This tip highlights the importance that the order of examples shown to the model during training has on the training process.

Generally, the authors highlight that the learning algorithm performs better when the next example used to update the model is different from the previous example. Ideally, it is the most different or unfamiliar to the model.

Networks learn the fastest from the most unexpected sample. Therefore, it is advisable to choose a sample at each iteration that is the most unfamiliar to the system.

One simple way to implement this trick is to ensure that successive examples used to update the model parameters are from different classes.

… a very simple trick that crudely implements this idea is to simply choose successive examples that are from different classes since training examples belonging to the same class will most likely contain similar information.

This trick can also be implemented by showing and re-showing examples to the model it gets the most wrong or makes the most error on when making a prediction. This approach can be effective, but can also lead to disaster if the examples that are over-represented during training are outliers.

Choose Examples with Maximum Information Content

1. Shuffle the training set so that successive training examples never (rarely) belong to the same class.

2. Present input examples that produce a large error more frequently than examples that produce a small error

This tip highlights the importance of data preparation prior to training a neural network model.

The authors point out that neural networks often learn faster when the examples in the training dataset sum to zero. This can be achieved by subtracting the mean value from each input variable, called centering.

Convergence is usually faster if the average of each input variable over the training set is close to zero.

They also comment that this centering of inputs also improves the convergence of the model when applied to the inputs to hidden layers from prior layers. This is fascinating as it lays the foundation for the Batch Normalization technique developed and made widely popular nearly 15 years later.

Therefore, it is good to shift the inputs so that the average over the training set is close to zero. This heuristic should be applied at all layers which means that we want the average of the outputs of a node to be close to zero because these outputs are the inputs to the next layer

The authors also comment on the need to normalize the spread of the input variables. This can be achieved by dividing the values by their standard deviation. For variables that have a Gaussian distribution, centering and normalizing values in this way means that they will be reduced to a standard Gaussian with a mean of zero and a standard deviation of one.

Scaling speeds learning because it helps to balance out the rate at which the weights connected to the input nodes learn.

Finally, they suggest de-correlating the input variables. This means removing any linear dependence between the input variables and can be achieved using a Principal Component Analysis as a data transform.

Principal component analysis (also known as the Karhunen-Loeve expansion) can be used to remove linear correlations in inputs

This tip on data preparation can be summarized as follows:

Transforming the Inputs

1. The average of each input variable over the training set should be close to zero.

2. Scale input variables so that their covariances are about the same.

3. Input variables should be uncorrelated if possible.

These recommended three steps of data preparation of centering, normalizing, and de-correlating are summarized nicely in a figure, reproduced from the book below:

The centering of input variables may or may not be the best approach when using the more modern ReLU activation functions in the hidden layers of your network, so I’d recommend evaluating both standardization and normalization procedures when preparing data for your model.

This tip recommends the use of sigmoid activation functions in the hidden layers of your network.

Nonlinear activation functions are what give neural networks their nonlinear capabilities. One of the most common forms of activation function is the sigmoid …

Specifically, the authors refer to a sigmoid activation function as any S-shaped function, such as the logistic (referred to as sigmoid) or hyperbolic tangent function (referred to as tanh).

Symmetric sigmoids such as hyperbolic tangent often converge faster than the standard logistic function.

The authors recommend modifying the default functions (if needed) so that the midpoint of the function is at zero.

The use of logistic and tanh activation functions for the hidden layers is no longer a sensible default as the performance models that use ReLU converge much faster.

This tip highlights a more careful consideration of the choice of target variables.

In the case of binary classification problems, target variables may be in the set 0, 1 for the limits of the logistic activation function or in the set -1, 1 for the hyperbolic tangent function when using the cross-entropy or hinge loss functions respectively, even in modern neural networks.

The authors suggest that using values at the extremes of the activation function may make learning the problem more challenging.

Common wisdom might seem to suggest that the target values be set at the value of the sigmoid’s asymptotes. However, this has several drawbacks.

They suggest that achieving values at the point of saturation of the activation function (edges) may require larger and larger weights, which could make the model unstable.

One approach to addressing this is to use target values away from the edge of the output function.

Choose target values at the point of the maximum second derivative on the sigmoid so as to avoid saturating the output units.

I recall that in the 1990s, it was common advice to use target values in the set of 0.1 and 0.9 with the logistic function instead of 0 and 1.

This tip highlights the importance of the choice of weight initialization scheme and how it is tightly related to the choice of activation function.

In the context of the sigmoid activation function, they suggest that the initial weights for the network should be chosen to activate the function in the linear region (e.g. the line part not the curve part of the S-shape).

The starting values of the weights can have a significant effect on the training process. Weights should be chosen randomly but in such a way that the sigmoid is primarily activated in its linear region.

This advice may also apply to the weight activation for the ReLU where the linear part of the function is positive.

This highlights the important impact that initial weights have on learning, where large weights saturate the activation function, resulting in unstable learning, and small weights result in very small gradients and, in turn, slow learning. Ideally, we seek model weights that are over the linear (non-curvy) part of the activation function.

… weights that range over the sigmoid’s linear region have the advantage that (1) the gradients are large enough that learning can proceed and (2) the network will learn the linear part of the mapping before the more difficult nonlinear part.

The authors suggest a random weight initialization scheme that uses the number of nodes in the previous layer, the so-called fan-in. This is interesting as it is a precursor of what became known as the Xavier weight initialization scheme.

This tip highlights the importance of choosing the learning rate.

The learning rate is the amount that the model weights are updated each iteration of the algorithm. A small learning rate can cause slower convergence but perhaps a better result, whereas a larger learning rate can result in faster convergence but perhaps to a less optimal result.

The authors suggest decreasing the learning rate when the weight values begin changing back and forth, e.g. oscillating.

Most of those schemes decrease the learning rate when the weight vector “oscillates”, and increase it when the weight vector follows a relatively steady direction.

They comment that this is a hard strategy when using online gradient descent as, by default, the weights will oscillate a lot.

The authors also recommend using one learning rate for each parameter in the model. The goal is to help each part of the model to converge at the same rate.

… it is clear that picking a different learning rate (eta) for each weight can improve the convergence. […] The main philosophy is to make sure that all the weights in the network converge roughly at the same speed.

They refer to this property as “*equalizing the learning speeds*” of each model parameter.

Equalize the Learning Speeds

– give each weight its own learning rate

– learning rates should be proportional to the square root of the number of inputs to the unit

– weights in lower layers should typically be larger than in the higher layers

In addition to using a learning rate per parameter, the authors also recommend using momentum and using adaptive learning rates.

It’s interesting that these recommendations later became enshrined in methods like AdaGrad and Adam that are now popular defaults.

This final tip is perhaps less relevant today, and I recommend trying radial basis functions (RBF) instead of sigmoid activation functions in some cases.

The authors suggest that training RBF units can be faster than training units using a sigmoid activation.

Unlike sigmoidal units which can cover the entire space, a single RBF unit covers only a small local region of the input space. This can be an advantage because learning can be faster.

After these tips, the authors go on to provide a theoretical grounding for why many of these tips are a good idea and are expected to result in better or faster convergence when training a neural network model.

Specifically, the tips supported by this analysis are:

- Subtract the means from the input variables
- Normalize the variances of the input variables.
- De-correlate the input variables.
- Use a separate learning rate for each weight.

The remainer of the chapter focuses on the use of second order optimization algorithms for training neural network models.

This may not be everyone’s cup of tea and requires a background and good memory of matrix calculus. You may want to skip it.

You may recall that the first derivative is the slope of a function (how steep it is) and that backpropagation uses the first derivative to update the models in proportion to their output error. These methods are referred to as first order optimization algorithms, e.g. optimization algorithms that use the first derivative of the error in the output of the model.

You may also recall from calculus that the second order derivative is the rate of change in the first order derivative, or in this case, the gradient of the error gradient itself. It gives an idea of how curved the loss function is for the current set of weights. Algorithms that use the second derivative are referred to as second order optimization algorithms.

The authors go on to introduce five second order optimization algorithms, specifically:

- Newton
- Conjugate Gradient
- Gauss-Newton
- Levenberg Marquardt
- Quasi-Newton (BFGS)

These algorithms require access to the Hessian matrix or an approximation of the Hessian matrix. You may also recall the Hessian matrix if you covered a theoretical introduction to the backpropagation algorithm. In a hand-wavy way, we use the Hessian to describe the second order derivatives for the model weights.

The authors proceed to outline a number of methods that can be used to approximate the Hessian matrix (for use in second order optimization algorithms), such as: finite difference, square Jacobian approximation, the diagonal of the Hessian, and more.

They then go on to analyze the Hessian in multilayer neural networks and the effectiveness of second order optimization algorithms.

In summary, they highlight that perhaps second order methods are more appropriate for smaller neural network models trained using batch gradient descent.

Classical second-order methods are impractical in almost all useful cases.

The chapter ends with a very useful summary of tips for getting the most out of backpropagation when training neural network models.

This summary is reproduced below:

– shuffle the examples

– center the input variables by subtracting the mean

– normalize the input variable to a standard deviation of 1

– if possible, de-correlate the input variables.

– pick a network with the sigmoid function shown in figure 1.4

– set the target values within the range of the sigmoid, typically +1 and -1.

– initialize the weights to random values (as prescribed by 1.16).

This section provides more resources on the topic if you are looking to go deeper.

In this post, you discovered tips and tricks for getting the most out of the backpropagation algorithm when training neural network models.

Have you tried any of these tricks on your projects?

Let me know about your results in the comments below.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

‘:””},t.getDefinedParams=function(e,t)return t.filter(function(t)return e[t]).reduce(function(t,n)return l(t,function(e,t,n)return t in e?Object.defineProperty(e,t,value:n,enumerable:!0,configurable:!0,writable:!0):e[t]=n,e(,n,e[n])),),t.isValidMediaTypes=function(e)!e.video.context,t.getBidderRequest=function(e,t,n)return(0,v.default)(e,function(e)return e.bids.filter(function(e)return e.bidder===t&&e.adUnitCode===n).length>0),t.getUserConfiguredParams=function(e,t,n)return e.filter(function(e)return e.code===t).map(function(e)return e.bids).reduce(s,[]).filter(function(e)return e.bidder===n).map(function(e)),t.getOrigin=function()return window.location.origin?window.location.origin:window.location.protocol+”//”+window.location.hostname+(window.location.port?”:”+window.location.port:””),t.getDNT=function(),t.isAdUnitCodeMatchingSlot=function(e)return function(t)return B(e,t),t.isSlotMatchingAdUnitCode=function(e)return function(t)return B(t,e),t.unsupportedBidderMessage=function(e,t),t.deletePropertyFromObject=function(e,t)var n=l(,e);return delete n[t],n,t.removeRequestId=function(e)return t.deletePropertyFromObject(e,”requestId”),t.isInteger=function(e)return Number.isInteger?Number.isInteger(e):”number”==typeof e&&isFinite(e)&&Math.floor(e)===e,t.convertCamelToUnderscore=function(e)return e.replace(/(?:^,t.transformBidderParamKeywords=function(e)var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:”keywords”,r=[];return t._each(e,function(e,i)if(t.isArray(e))var o=[];t._each(e,function(e)(e=t.getValueString(n+”.”+i,e))&&o.push(e)),e=oelseif(e=t.getValueString(n+”.”+i,e),!t.isStr(e))return;e=[e]r.push(key:i,value:e)),r,t.convertTypes=function(e,n)return Object.keys(e).forEach(function(r)var i,o;n[r]&&(t.isFn(e[r])?n[r]=e[r](n[r]):n[r]=(i=e[r],o=n[r],”string”===i?o&&o.toString():”number”===i?Number(o):o),

isNaN(n[r])&&delete n.key)),n;var g=n(3),h=r(n(63)),v=r(n(10)),b=r(n(5)),y=n(11),m=n(4),_=!1,A=Object.prototype.toString,E=Boolean(window.console),w=Boolean(E&&window.console.log),S=Boolean(E&&window.console.info),I=Boolean(E&&window.console.warn),T=Boolean(E&&window.console.error);t.replaceTokenInString=function(e,n,r)return t._each(n,function(t,n)t=void 0===t?””:t;var i=r+n.toUpperCase()+r,o=new RegExp(i,”g”);e=e.replace(o,t)),e;var x,j=(x=0,function()return++x);t.getUniqueIdentifierStr=i,t.generateUUID=function e(t)return t?(t^16*Math.random()>>t/4).toString(16):([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g,e),t.getBidIdParameter=function(e,t)return t&&t[e]?t[e]:””,t.tryAppendQueryString=function(e,t,n)return n?e+(t+”=”)+encodeURIComponent(n)+”&”:e,t.parseQueryStringParameters=function(e)var t=””;for(var n in e)e.hasOwnProperty(n)&&(t+=n+”=”+encodeURIComponent(e[n])+”&”);return t,t.transformAdServerTargetingObj=function(e)return e&&Object.getOwnPropertyNames(e).length>0?c(e).map(function(t)return t+”=”+encodeURIComponent(d(e,t))).join(“&”):””,t.getTopWindowLocation=function()if(t.inIframe())var e=void 0;trye=t.getAncestorOrigins()catch(e)logInfo(“could not obtain top window location”,e)if(e)return(0,y.parse)(e,decodeSearchAsString:!0)return t.getWindowLocation(),t.getTopFrameReferrer=function()trywindow.top.location.toString();for(var e=””,t=void 0;(t=t?t.parent:window).document&&t.document.referrer&&(e=t.document.referrer),t!==window.top;);return ecatch(e)return window.document.referrer,t.getAncestorOrigins=function()if(window.document.location&&window.document.location.ancestorOrigins&&window.document.location.ancestorOrigins.length>=1)return window.document.location.ancestorOrigins[window.document.location.ancestorOrigins.length-1],t.getWindowTop=function()return window.top,t.getWindowSelf=function()return window.self,t.getWindowLocation=function()return window.location,t.getTopWindowUrl=function()var e=void 0;trye=t.getTopWindowLocation().hrefcatch(t)e=””return e,t.getTopWindowReferrer=function()tryreturn window.top.document.referrercatch(e)return document.referrer,t.logMessage=function()C()&&w&&console.log.apply(console,a(arguments,”MESSAGE:”)),t.logInfo=function()C()&&S&&console.info.apply(console,a(arguments,”INFO:”)),t.logWarn=function()C()&&I&&console.warn.apply(console,a(arguments,”WARNING:”)),t.logError=function()C()&&T&&console.error.apply(console,a(arguments,”ERROR:”)),t.hasConsoleLogger=function()return w;var C=function()if(!1===g.config.getConfig(“debug”)&&!1===_)var e=”TRUE”===O(m.DEBUG_MODE).toUpperCase();g.config.setConfig(debug:e),_=!0return!!g.config.getConfig(“debug”);t.debugTurnedOn=C,t.createInvisibleIframe=function()var e=document.createElement(“iframe”);return e.id=i(),e.height=0,e.width=0,e.border=”0px”,e.hspace=”0″,e.vspace=”0″,e.marginWidth=”0″,e.marginHeight=”0″,e.style.border=”0″,e.scrolling=”no”,e.frameBorder=”0″,e.src=”about:blank”,e.style.display=”none”,e;var O=function(e)var t=new RegExp(“[\?&]”+e+”=([^]*)”).exec(window.location.search);return null===t?””:decodeURIComponent(t[1].replace(/+/g,” “));t.getParameterByName=O,t.hasValidBidRequest=function(e,n,r)function i(e,t)t===n[a]&&(o=!0)for(var o=!1,a=0;a*‘,t.createTrackPixelIframeHtml=function(e)var n=!(arguments.length>1&&void 0!==arguments[1]),t.getIframeDocument=function(e)if(e)var n=void 0;tryn=e.contentWindow?e.contentWindow.document:e.contentDocument.document?e.contentDocument.document:e.contentDocumentcatch(e)t.logError(“Cannot get iframe document”,e)return n,t.getValueString=function(e,n,r)return null==n?r:t.isStr(n)?n:t.isNumber(n)?n.toString():void t.logWarn(“Unsuported type for param: “+e+” required type: String”),t.getHighestCpm=f(“timeToRespond”,function(e,t)return e>t),t.getOldestHighestCpmBid=f(“responseTimestamp”,function(e,t)return e>t),t.getLatestHighestCpmBid=f(“responseTimestamp”,function(e,t){return e 1?arguments[1]:void 0)}),n(27)(o)},11:function(e,t,n){“use strict”;function r(e)return e?e.replace(/^?/,””).split(“&”).reduce(function(e,t)[],e[i].push(a)):e[i]=a,):function i(e)return Object.keys(e).map(function(t)return Array.isArray(e[t])?e[t].map(function(e)return t+”[]=”+e).join(“&”):t+”=”+e[t]).join(“&”)Object.defineProperty(t,”__esModule”,value:!0);var o=function(){return function(e,t)if(Array.isArray(e))return e;if(Symbol.iterator in Object(e))return function(e,t)var n=[],r=!0,i=!1,o=void 0;tryfor(var a,u=e[Symbol.iterator]();!(r=(a=u.next()).done)&&(n.push(a.value),!tcatch(e)i=!0,o=efinallytry!r&&u.return&&u.return()finallyif(i)throw oreturn n(e,t);throw new TypeError(“Invalid attempt to destructure non-iterable instance”)}();t.parseQS=r,t.formatQS=i,t.parse=function(e,t)var n=document.createElement(“a”);t&&”noDecodeWholeURL”in t&&t.noDecodeWholeURL?n.href=e:n.href=decodeURIComponent(e);var i=t&&”decodeSearchAsString”in t&&t.decodeSearchAsString;return””),hash:(n.hash,t.format=function(e)e.hostname+(e.port?”:”+e.port:””))+(e.pathname},12:function(e,t,n)”use strict”;function r(e)Object.defineProperty(t,”__esModule”,value:!0),t.Renderer=r;var i=n(22),o=function(e)if(e&&e.__esModule)return e;var t=;if(null!=e)for(var n in e)Object.prototype.hasOwnProperty.call(e,n)&&(t[n]=e[n]);return t.default=e,t(n(0));r.install=function(e)return new r(url:e.url,config:e.config,id:e.id,callback:e.callback,loaded:e.loaded),r.prototype.getConfig=function()return this.config,r.prototype.setRender=function(e)this.render=e,r.prototype.setEventHandlers=function(e)this.handlers=e,r.prototype.handleVideoEvent=function(e)var t=e.id,n=e.eventName;”function”==typeof this.handlers[n]&&this.handlers[n](),o.logMessage(“Prebid Renderer event for id “+t+” type “+n),r.prototype.process=function()for(;this.cmd.length>0;)trythis.cmd.shift().call()catch(e)o.logError(“Error processing Renderer command: “,e),13:function(e,t)var n=e.exports=version:”2.5.3″;”number”==typeof __e&&(__e=n),14:function(e,t,n){var r=n(19),i=n(13),o=n(32),a=n(47),u=”prototype”,s=function e(t,n,s){var c,d,f,l=t&e.F,p=t&e.G,g=t&e.S,h=t&e.P,v=t&e.B,b=t&e.W,y=p?i:i[n]||(i[n]=),m=y[u],_=p?r:g?r[n]:(r[n]||)[u];for(c in p&&(s=n),s)(d=!l&&_&&void 0!==_[c])&&c in y||(f=d?_[c]:s[c],y[c]=p&&”function”!=typeof _[c]?s[c]:v&&d?o(f,r):b&&_[c]==f?function(e)var t=function(t,n,r)if(this instanceof e)switch(arguments.length)case 0:return new e;case 1:return new e(t);case 2:return new e(t,n)return new e(t,n,r)return e.apply(this,arguments);return t[u]=e[u],t(f):h&&”function”==typeof f?o(Function.call,f):f,h&&((y.virtual||(y.virtual=))[c]=f,t&e.R&&m&&!m[c]&&a(m,c,f)))};s.F=1,s.G=2,s.S=4,s.P=8,s.B=16,s.W=32,s.U=64,s.R=128,e.exports=s},15:function(e,t)e.exports=function(e)return”object”==(void 0===e?”undefined”:_typeof(e))?null!==e:”function”==typeof e,16:function(e,t,n){“use strict”;var r=n(0);t.createBid=function(e,t)return new function(e,t)r.getUniqueIdentifierStr(),i=t&&t.src(e,t)},17:function(e,t,n){“use strict”;function r(e){function t(){if(l.syncEnabled&&e.browserSupportsCookies&&!a)try!function()f.image)&&u.shuffle(r.image).forEach(function(e)var t=i(e,2),n=t[0],r=t[1];u.logMessage(“Invoking image pixel user sync for bidder: “+n),u.triggerPixel(r))(),function()(l.iframeEnabled()catch(e)return u.logError(“Error firing user syncs”,e)r=image:[],iframe:[],a=!0}var n=,r=image:[],iframe:[],a=!1,c=,f=image:!1,iframe:!1,l=e.config;return s.config.getConfig(“userSync”,function(e)l=o(l,e.userSync)),n.registerSync=function(e,t,n){if(!l.syncEnabled||!u.isArray(r[e]))return u.logWarn(‘User sync type “‘+e+'” not supported’);if(!t)return u.logWarn(“Bidder is required for registering sync”);if(Number(c[t])>=l.syncsPerBidder)return u.logWarn(‘Number of user syncs exceeded for “‘+t+'”‘);if(l.filterSettings){if(function(e,t)var n=l.filterSettings;if(function(e,t)(n,e))”include”;returninclude:function(e,t)return!(0,d.default)(e,t),exclude:function(e,t)return(0,d.default)(e,t)[o](i,t)return!1(e,t))return u.logWarn(“Bidder ‘”+t+”‘ is not permitted to register their userSync “+e+” pixels as per filterSettings config.”)}else if(l.enabledBidders&&l.enabledBidders.length&&l.enabledBidders.indexOf(t)0&&void 0!==arguments[0]?arguments[0]:0;if(e)return setTimeout(t,Number(e));t()},n.triggerUserSyncs=function()l.enableOverride&&n.syncUsers(),n}Object.defineProperty(t,”__esModule”,value:!0),t.userSync=void 0;var i=function(){return function(e,t)if(Array.isArray(e))return e;if(Symbol.iterator in Object(e))return function(e,t)var n=[],r=!0,i=!1,o=void 0;tryfor(var a,u=e[Symbol.iterator]();!(r=(a=u.next()).done)&&(n.push(a.value),!tcatch(e)i=!0,o=efinallytry!r&&u.return&&u.return()finallyif(i)throw oreturn n(e,t);throw new TypeError(“Invalid attempt to destructure non-iterable instance”)}(),o=Object.assign||function(e)for(var t=1;t*

d.default)(window.googletag.pubads().getSlots().filter((0,s.isSlotMatchingAdUnitCode)(n)),function(e)return e).getSlotElementId()).querySelector(t)).style;o.width=r,o.height=i),p.postMessage(JSON.stringify(message:”Prebid Response”,ad:h,adUrl:v,adId:g,width:b,height:y),l)),c.auctionManager.addWinningBid(A),o.default.emit(f,A)),”Prebid Native”===_.message&&((0,a.fireNativeTrackers)(_,A),c.auctionManager.addWinningBid(A),o.default.emit(f,A))Object.defineProperty(t,”__esModule”,value:!0),t.listenMessagesFromCreative=function()addEventListener(“message”,i,!1);var o=r(n(9)),a=n(18),u=n(4),s=n(0),c=n(29),d=r(n(10)),f=u.EVENTS.BID_WON},558:function(e,t,n)function(e)for(var t=1;t

Worldwide Computer Services Pte Ltd

source ]]>

There’s plenty of speculation right now around apparently disgruntled investors in SoftBank’s Vision Fund, but the drum continues to beat and the checks continue to be written. The latest deal for the $100 billion mega-fund is Clutter, an on-demand storage company that pulled in $200 million in new financing for growth.

Eagled-eyed viewers will recall that TechCrunch broke news of an impending SoftBank-led round of that size back in January, and now it is official.

The startup is one of a number of companies that provide storage options for consumers who don’t want to part with items but equally don’t have the capacity to keep it where they live. The service is based around an app that is used to summon Clutter staff to pack up, take away, store and (later) return possessions, but it can also be used for regular house moving, too. Competitors in the space include MakeSpace, Omni, Trove, Livible, and Closetbox.

Joining SoftBank in the deal are existing Clutter investors Sequoia, Atomico, GV, Fifth Wall and Four Rivers who fronted the company’s last round, a $64 million raise nearly two years ago. This new capital means that Clutter has raised $297 million from investors to date.

There’s no confirmation of a valuation for the startup, but our well-placed sources previously told us that this round would value Clutter at between $400 million and $500 million. One thing that is confirmed, however, is that SoftBank’s Justin Wilson will join the board.

The money will go towards expansion in the U.S. as Clutter explained in an announcement, but there are hints that it harbors overseas ambitions, too:

This funding will accelerate the company’s expansion into new markets in 2019, including Philadelphia, Portland and Sacramento. It’s also doubling down in its existing markets in the greater areas of New York, San Francisco, Los Angeles, Chicago, Seattle, San Diego, Orange County and northern New Jersey, as it marches toward a goal of operating in America’s largest 50 cities and expanding internationally.

“We believe that storage is a vast and traditional market with huge potential for disruption, and Clutter’s technology and superior customer proposition will help facilitate future growth in expanding urban communities where space is at a premium,” said SoftBank’s Wilson in a statement.

Examining a prescient online review.

Source link ]]>

Short Selling Strategy

source ]]>

ViSenze, a startup that provides visual search tools for online retailers like Rakuten and ASOS, announced today that it has raised a $20 million Series C. The round was co-led by Gobi Ventures and Sonae IM, with participation from other backers including returning investors Rakuten and WI Harper.

Founded in 2012, ViSenze has now raised a total of $34.5 million (its last round was a Series B announced in September 2016). The Singapore-based company, whose clients also include Urban Outfitters, Zalora, and Uniqlo, bills its software portfolio as a “personal shopping concierge” that allows shoppers to find or discover new products based on visual search, automatic photo tagging, and recommendations based on their browsing history. ViSenze’s verticals include fashion, jewelry, furniture, and intellectual property.

ViSenze’s latest funding will be used to develop its software through partnerships with smartphone makers including Samsung, LG, and Huawei. The company has offices in Asia, Europe, and the United States, and claims an annual revenue growth rate of more than 200 percent. Other startups in the same space include Syte.ai, Slyce, Clarifai, and Imagga.

In a statement, Rakuten Ventures partner Adit Swarup said “When we first invested in ViSenze in 2014, retailers had just started seeing the benefits of powering product recommendations with image data. Today, ViSenze not only powers recommendations for the largest brands in the world, but has helped pioneer a paradigm shift in e-commerce; helping consumers find products inside their favorite social media videos and images, as well as initiate a search directly from their camera app.”

Other participants in the round included returning investors Singapore Press Holdings (SPH) Ventures, Raffles Venture Partners, Enspire Capital, and UOB Venture Management, as well as new investors Tembusu ICT Fund, 31Ventures Global Innovation Fund, and Jonathan Coon’s Impossible Ventures.

Deep learning neural networks are challenging to configure and train.

There are decades of tips and tricks spread across hundreds of research papers, source code, and in the heads of academics and practitioners.

The book “Neural Networks: Tricks of the Trade” originally published in 1998 and updated in 2012 at the cusp of the deep learning renaissance ties together the disparate tips and tricks into a single volume. It includes advice that is required reading for all deep learning neural network practitioners.

In this post, you will discover the book “*Neural Networks: Tricks of the Trade*” that provides advice by neural network academics and practitioners on how to get the most out of your models.

After reading this post, you will know:

- The motivation for why the book was written.
- A breakdown of the chapters and topics in the first and second editions.
- A list and summary of the must-read chapters for every neural network practitioner.

Let’s get started.

Neural Networks: Tricks of the Trade is a collection of papers on techniques to get better performance from neural network models.

The first edition was published in 1998 comprised of five parts and 17 chapters. The second edition was published right on the cusp of the new deep learning renaissance in 2012 and includes three more parts and 13 new chapters.

If you are a deep learning practitioner, then it is a must read book.

I own and reference both editions.

The motivation for the book was to collate the empirical and theoretically grounded tips, tricks, and best practices used to get the best performance from neural network models in practice.

The author’s concern is that many of the useful tips and tricks are tacit knowledge in the field, trapped in peoples heads, code bases, or at the end of conference papers and that beginners to the field should be aware of them.

It is our belief that researchers and practitioners acquire, through experience and word-of-mouth, techniques and heuristics that help them successfully apply neural networks to difficult real-world problems. […] they are usually hidden in people’s heads or in the back pages of space-constrained conference papers.

The book is an effort to try to group the tricks together, after the success of a workshop at the 1996 NIPS conference with the same name.

This book is an outgrowth of a 1996 NIPS workshop called Tricks of the Trade whose goal was to begin the process of gathering and documenting these tricks. The interest that the workshop generated motivated us to expand our collection and compile it into this book.

— Page 1, Neural Networks: Tricks of the Trade, Second Edition, 2012.

The first edition of the book was put together (edited) by Genevieve Orr and Klaus-Robert Muller comprised of five parts and 17 chapters and was published 20 years ago in 1998.

Each part includes a useful preface that summarizes what to expect in the upcoming chapters, and each chapter written by one or more academics in the field.

The breakdown of this first edition was as follows:

- Chapter 1: Efficient BackProp

- Chapter 2: Early Stopping – But When?
- Chapter 3: A Simple Trick for Estimating the Weight Decay Parameter
- Chapter 4: Controlling the Hyperparameter Search on MacKay’s Bayesian Neural Network Framework
- Chapter 5: Adaptive Regularization in Neural Network Modeling
- Chapter 6: Large Ensemble Averaging

- Chapter 7: Square Unit Augmented, Radically Extended, Multilayer Perceptrons
- Chapter 8: A Dozen Tricks with Multitask Learning
- Chapter 9: Solving the Ill-Conditioning on Neural Network Learning
- Chapter 10: Centering Neural Network Gradient Factors
- Chapter 11: Avoiding Roundoff Error in Backpropagating Derivatives

- Chapter 12: Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent Propagation
- Chapter 13: Combining Neural Networks and Context-Driven Search for On-Line Printed Handwriting Recognition in the Newton
- Chapter 14: Neural Network Classification and Prior Class Probabilities
- Chapter 15: Applying Divide and Conquer to Large Scale Pattern Recognition Tasks

- Chapter 16: Forecasting the Economy with Neural Nets: A Survey of Challenges and Solutions
- Chapter 17: How to Train Neural Networks

It is an expensive book, and if you can pick-up a cheap second-hand copy of this first edition, then I highly recommend it.

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

The second edition of the book was released in 2012, seemingly right at the beginning of the large push that became “deep learning.” As such, the book captures the new techniques at the time such as layer-wise pretraining and restricted Boltzmann machines.

It was too early to focus on the ReLU, ImageNet with CNNs, and use of large LSTMs.

Nevertheless, the second edition included three new parts and 13 new chapters.

The breakdown of the additions in the second edition are as follows:

- Chapter 18: Stochastic Gradient Descent Tricks
- Chapter 19: Practical Recommendations for Gradient-Based Training of Deep Architectures
- Chapter 20: Training Deep and Recurrent Networks with Hessian-Free Optimization
- Chapter 21: Implementing Neural Networks Efficiently

- Chapter 22: Learning Feature Representations with K-Means
- Chapter 23: Deep Big Multilayer Perceptrons for Digit Recognition
- Chapter 24: A Practical Guide to Training Restricted Boltzmann Machines
- Chapter 25: Deep Boltzmann Machines and the Centering Trick
- Chapter 26: Deep Learning via Semi-supervised Embedding

- Chapter 27: A Practical Guide to Applying Echo State Networks
- Chapter 28: Forecasting with Recurrent Neural Networks: 12 Tricks
- Chapter 29: Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks
- Chapter 30: 10 Steps and Some Tricks to Set up Neural Reinforcement Controllers

The whole book is a good read, although I don’t recommend reading all of it if you are looking for quick and useful tips that you can use immediately.

This is because many of the chapters focus on the writers’ pet projects, or on highly specialized methods. Instead, I recommend reading four specific chapters, two from the first edition and two from the second.

The second edition of the book is worth purchasing for these four chapters alone, and I highly recommend picking up a copy for yourself, your team, or your office.

Fortunately, there are pre-print PDFs of these chapters available for free online.

The recommended chapters are:

Let’s take a closer look at each of these chapters in turn.

This chapter focuses on providing very specific tips to get the most out of the stochastic gradient descent optimization algorithm and the backpropagation weight update algorithm.

Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work.

— Page 9, Neural Networks: Tricks of the Trade, First Edition, 1998.

The chapter proceeds to provide a dense and theoretically supported list of tips for configuring the algorithm, preparing input data, and more.

The chapter is so dense that it is hard to summarize, although a good list of recommendations is provided in the “*Discussion and Conclusion*” section at the end, quoted from the book below:

– shuffle the examples

– center the input variables by subtracting the mean

– normalize the input variable to a standard deviation of 1

– if possible, decorrelate the input variables.

– pick a network with the sigmoid function shown in figure 1.4

– set the target values within the range of the sigmoid, typically +1 and -1.

– initialize the weights to random values as prescribed by 1.16.The preferred method for training the network should be picked as follows:

– if the training set is large (more than a few hundred samples) and redundant, and if the task is classification, use stochastic gradient with careful tuning, or use the stochastic diagonal Levenberg Marquardt method.

– if the training set is not too large, or if the task is regression, use conjugate gradient.

— Pages 47-48, Neural Networks: Tricks of the Trade, First Edition, 1998.

The field of applied neural networks has come a long way in the twenty years since this was published (e.g. the comments on sigmoid activation functions are no longer relevant), yet the basics have not changed.

This chapter is required reading for all deep learning practitioners.

This chapter describes the simple yet powerful regularization method called early stopping that will halt the training of a neural network when the performance of the model begins to degrade on a hold-out validation dataset.

Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”)

— Page 55, Neural Networks: Tricks of the Trade, First Edition, 1998.

The challenge of early stopping is the choice and configuration of the trigger used to stop the training process, and the systematic configuration of early stopping is the focus of the chapter.

The general early stopping criteria are described as:

**GL**: stop as soon as the generalization loss exceeds a specified threshold.**PQ**: stop as soon as the quotient of generalization loss and progress exceeds a threshold.**UP**: stop when the generalization error increases in strips.

Three recommendations are provided, e.g. “*the trick*“:

1. Use fast stopping criteria unless small improvements of network performance (e.g. 4%) are worth large increases of training time (e.g. factor 4).

2. To maximize the probability of finding a “good” solution (as opposed to maximizing the average quality of solutions), use a GL criterion.

3. To maximize the average quality of solutions, use a PQ criterion if the net- work overfits only very little or an UP criterion otherwise.

— Page 60, Neural Networks: Tricks of the Trade, First Edition, 1998.

The rules are analyzed empirically over a large number of training runs and test problems. The crux of the finding is that being more patient with the early stopping criteria results in better hold-out performance at the cost of additional computational complexity.

I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).

— Page 55, Neural Networks: Tricks of the Trade, First Edition, 1998.

This chapter focuses on a detailed review of the stochastic gradient descent optimization algorithm and tips to help get the most out of it.

This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.

— Page 421, Neural Networks: Tricks of the Trade, Second Edition, 2012.

There is a lot of overlap with *Chapter 1: Efficient BackProp*, and although the chapter calls out tips along the way with boxes, a useful list of tips is not summarized at the end of the chapter.

Nevertheless, it is a compulsory read for all neural network practitioners.

Below is my own summary of the tips called out in boxes throughout the chapter, mostly quoting directly from the second edition:

- Use stochastic gradient descent (batch=1) when training time is the bottleneck.
- Randomly shuffle the training examples.
- Use preconditioning techniques.
- Monitor both the training cost and the validation error.
- Check the gradients using finite differences.
- Experiment with the learning rates [with] a small sample of the training set.
- Leverage the sparsity of the training examples.
- Use a decaying learning rate.
- Try averaged stochastic gradient (i.e. a specific variant of the algorithm).

Some of these tips are pithy without context; I recommend reading the chapter.

This chapter focuses on the effective training of neural networks and early deep learning models.

It ties together the classical advice from Chapters 1 and 29 but adds comments on (at the time) recent deep learning developments like greedy layer-wise pretraining, modern hardware like GPUs, modern efficient code libraries like BLAS, and advice from real projects tuning the training of models, like the order to train hyperparameters.

This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on backpropagated gradient and gradient-based optimization.

— Page 437, Neural Networks: Tricks of the Trade, Second Edition, 2012.

It’s also long, divided into six main sections:

**Deep Learning Innovations**. Including greedy layer-wise pretraining, denoising autoencoders, and online learning.**Gradients**. Including mini-batch gradient descent and automatic differentiation.**Hyperparameters**. Including learning rate, mini-batch size, epochs, momentum, nodes, weight regularization, activity regularization, hyperparameter search, and recommendations.**Debugging**and Analysis. Including monitoring loss for overfitting, visualization, and statistics.**Other Recommendations**. Including GPU hardware and use of efficient linear algebra libraries such as BLAS.**Open Questions**. Including the difficulty of training deep models and adaptive learning rates.

There’s far too much for me to summarize; the chapter is dense with useful advice for configuring and tuning neural network models.

Without a doubt, this is required reading and provided the seeds for the recommendations later described in the 2016 book Deep Learning, of which Yoshua Bengio was one of three authors.

The chapter finishes on a strong, optimistic note.

The practice summarized here, coupled with the increase in available computing power, now allows researchers to train neural networks on a scale that is far beyond what was possible at the time of the first edition of this book, helping to move us closer to artificial intelligence.

— Page 473, Neural Networks: Tricks of the Trade, Second Edition, 2012.

In this post, you discovered the book “*Neural Networks: Tricks of the Trade*” that provides advice from neural network academics and practitioners on how to get the most out of your models.

Have you read some or all of this book? What do you think of it?

Let me know in the comments below.