Machine Learning Strategy

The motivation for this blog is significantly inspired while I am taking a course Structuring your Machine Learning Project taught by Andrew Ng. Have been working on 4 ML class projects and develop 2 proof-of-concept ML applications, I’ve been so reflective on how relatable this course is, so I feel an urge to write some of my key takeaways from this course and some of my add-on thoughts.

Why do we need to have a ML Strategy?

Building a ML model is highly emperical. First, you try to fit your training data into some kinds of models, see how they perform and tune your models from there. However, the tuning part is a difficult one. If you just need a decent model, with all the avaialable open-sourced packages and online tutorial, it’s easy to just drop some exising model to your application with a basic understanding of how the algorithm works and a few lines of code. However, if you want to improve the performance of your basic models, where do you start? The reason ML is so hard is because it’s exponentially hard to debug. I took this picture from Zayd’s blog since it illustrates this point nicely.
alt text
There’re so many things you could try to improve your ML models, and if you choose poorly, it’s entirely possible that you’re spending months charging in some directions just to realize that all those experiments didn’t do you any good. Thus, it is worth having a strategy in mind to point you in the direction of the most promising things to try.

Model Tuning Direction: Orthogonalization

Orthgonalization is the idea that you only change one tuning direction at a time, instead of trying to change things that belong to multiple directions all at once. The analogy that Andrew uses is that when designing cars, if you have a separate knob to steer angle and a separate knob to control the speed, then it would be easier to control the cars. On the other hands, if you try to have a knob that control both directions, for example
0.3 x angle + 0.5 x speed, then it must be so confusing to steer the car at the right directions and with the right speed. Let’s feel appreciated that our car manufactures embrace the concept of orthogonalization.
4 crucial tuning dimensions that you have to look for to achieve a good ML system are

Fit training set well on cost function
- switch optimization algorithms (Adam optimizer)
- bigger network
Fit dev set well on cost function
- regularization
- bigger training set
Fit test set well on cost function
- if model does well on “dev set”, but not on “test set”: get a bigger “dev set”
Perform well in real world
- change dev set: dev set’s distribution isn’t set correctly
- change cost function: cost function doesn’t measure the right thing

Early stopping is one example of trying to control both dimensions “training set” and “dev set” at the same time

Diagnose the bottleneck of the system’s performance

1. Single number evaluation metric

Key takeaway

Pick a single evaluation metric, which can be from taking average of candidate metrics or some kind of combining metrics (e.g. F1 scores, ROC)

Having a well-defined dev set and a single evaluation metric can speed up iterative process of improving ML algorithms.
But there’re so many evaluation metrics and if your models need to have more than 1 metric to indicate its performance, you can try to find a way to combine them.
For instance, for image classification, it’s a good practice to consider both Precision and Recall, but if the results come out as the table below, how do you know which classifier is better? The best way is looking for a combining metric of your original evaluation metrics. In this case, it would be F1 score

Classifier	Precision	Recall	F1 score
A	95%	90%	92.4%
B	98%	85%	91.0%

Sometimes, you need to evaluate your dev set on some important categorical variables such as geography. For instance, if you develop an app of cat pictures for cat lovers, you might want to know the error rate of pictures submitted by users from different geographies. However, this makes it challenging to compare algorithms. The best solution for this is keeping track of the average error rate of all geographies.

Classifier	US	China	India	Other	Average
A	3%	7%	5%	9%	6%
B	5%	6%	5%	10%	6.5%

Machine Learning Strategy

Why do we need to have a ML Strategy?

Model Tuning Direction: Orthogonalization

Diagnose the bottleneck of the system’s performance

1. Single number evaluation metric

2. Sacrificing and Optimizing Metrics

3. Train/dev/test Distributions

4. Size of dev and test sets

5. When to change dev/ test sets