The universal workflow of machine learning

2018-01-09

What presented here is a universal blueprint one can use to attack and solve any machine learning problem, tying together the different concepts you learned about in this chapter: problem definition, evaluation, feature engineering, and fighting overfitting.

Define the problem and assemble a dataset

What will your input data will be? What will you be trying to predict?
- One input to one output
- Multiple inputs to one output
- One input to multiple outputs
- Multiple inputs to multiple outputs
What type of problem are you facing
- Binary classification
- Multi-class classification
- Scalar regression
- Vector regression
- Multi-class, multi-label classification
- Clustering
- Generation
- Reinforcement learning

Identifying the problem type will guide your choice of model architecture, loss function, and so on.

Pick a measure of success

Balanced classification problems
- Accuracy
- ROC-AUC (Receiver Operating Characteristic Area Under the Curve)
Class-imbalanced problems
- Precision-Recall
Ranking problems or multi-label classification
- Mean Average Precision

Decide on an evaluation protocol

Once you know what you are aiming for, you must establish how you will measure your current progress. We have previously reviewed three common evaluation protocols:

Maintaining a hold-out validation set; this is the way to go when you have plenty of data.
Doing K-fold cross-validation; this is the way to go when you have too few samples forhold-out validation to be reliable.
Doing iterated K-fold validation; this is for performing highly accurate model evaluation when little is available.

Prepare your data

Once you know what you are training on, what you are optimizing for, and how to evaluate your approach, you are almost ready to start training models. But first, you should format your data in a way that can be fed into a machine learning model.

As we saw previously, your data should be formatted as tensors.
The values taken by these tensors should almost typically be scaled to small values, e.g. in the [-1, 1] range or [0, 1] range.
If different features take values in different ranges (heterogenous data), then the data should be normalized.
You may want to do some feature engineering, especially for small data problems.

Develop a model that does better than a baseline

Your goal at this stage is to achieve “statistical power”, i.e. develop a small model that is capable of beating a dumb baseline. Note that it is not always possible to achieve statistical power. If you cannot beat a random baseline after trying multiple reasonable architectures, it may be that the answer to the question you are asking isn¡¯t actually present in the input data. Remember that you are making two hypotheses:

You are hypothesizing that your outputs can be predicted given your inputs.
You are hypothesizing that your available data is sufficiently informative to learn the relationship between inputs and outputs.

Here is a table to help you pick a last-layer activation and a loss function for a few common problem types:

Problem type	Last-layer activation	Loss function
Binary classification	sigmoid	binary_crossentropy
Multi-class, single-label classification	softmax	categorical_crossentropy
Multi-class, multi-label classification	sigmoid	binary_crossentropy
Regression to arbitrary values	None	mse
Regression to values between 0 and 1	sigmoid	mse or binary_crossentropy

Scale up: develop a model that overfits

Once you have obtained a model that has statistical power, the question becomes: is your model powerful enough? Does it have enough layers and parameters to properly model the problem at hand?

To figure out how big a model you will need, you must develop a model that overfits. This is fairly easy:

Add layers.
Make your layers bigger.
Train for more epochs.

Always monitor the training loss and validation loss, as well as the training and validation values for any metrics you care about. When you see that the performance of the model on the validation data starts degrading, you have achieved overfitting.

The next stage is to start regularizing and tuning your model, in order to get as close as possible to the ideal model, that is neither underfitting nor overfitting.

Regularize your model and tune your hyperparameters

You will repeatedly modify your model, train it, evaluate on your validation data (not your test data at this point), modify it again until your model is as good as it can get.

These are some of things you should be trying:

Add dropout.
Try different architectures, add or remove layers.
Add L1 / L2 regularization.
Try different hyperparameters (such as the number of units per layer, the learning rate of the optimizer) to find the optimal configuration.
Optionally iterate on feature engineering: add new features, remove features that do not seem to be informative.

Be mindful of the following: every time you are using feedback from your validation process in order to tune your model, you are leaking information about your validation process into your model. Repeated just a few times, this is innocuous, but done systematically over many iterations will eventually cause your model to overfit to the validation process (even though no model is directly trained on any of the validation data). This makes your evaluation process less reliable, so keep it in mind.

Once you have developed a seemingly good enough model configuration, you can train your final production model on all data available (training and validation) and evaluate it one last time on the test set. If it turns out that the performance on the test set is significantly worse than the performance measured on the validation data, this could mean either that your validation procedure wasn¡¯t that reliable after all, or alternatively it could mean that have started overfitting to the validation data while tuning the parameters of the model. In this case you may want to switch to a more reliable evaluation protocol (e.g. iterated K-fold validation).

In summary

This is the universal workflow of machine learning:

Define the problem at hand and the data you will be training on; collect this data or annotate it with labels if need be.
Choose how you will measure success on your problem. Which metrics will you be monitoring on your validation data?
Determine your evaluation protocol: hold-out validation? K-fold validation? Which portion of the data should you use for validation?
Develop a first model that does better than a basic baseline: a model that has “statistical power”.
Develop a model that overfits.
Regularize your model and tune its hyperparameters, based on performance on the validation data.

Andrew Peng

Xueping Peng