The motivation for this blog is significantly inspired while I am taking a course Structuring your Machine Learning Project taught by Andrew Ng. Have been working on 4 ML class projects and develop 2 proof-of-concept ML applications, I’ve been so reflective on how relatable this course is, so I feel an urge to write some of my key takeaways from this course and some of my add-on thoughts.
Building a ML model is highly emperical. First, you try to fit your training data into some kinds of models, see how they perform and tune your models from there. However, the tuning part is a difficult one. If you just need a decent model, with all the avaialable open-sourced packages and online tutorial, it’s easy to just drop some exising model to your application with a basic understanding of how the algorithm works and a few lines of code. However, if you want to improve the performance of your basic models, where do you start? The reason ML is so hard is because it’s exponentially hard to debug. I took this picture from Zayd’s blog since it illustrates this point nicely.
There’re so many things you could try to improve your ML models, and if you choose poorly, it’s entirely possible that you’re spending months charging in some directions just to realize that all those experiments didn’t do you any good. Thus, it is worth having a strategy in mind to point you in the direction of the most promising things to try.
Orthgonalization is the idea that you only change one tuning direction at a time, instead of trying to change things that belong to multiple directions all at once. The analogy that Andrew uses is that when designing cars, if you have a separate knob to steer angle and a separate knob to control the speed, then it would be easier to control the cars. On the other hands, if you try to have a knob that control both directions, for example
0.3 x angle + 0.5 x speed
, then it must be so confusing to steer the car at the right directions and with the right speed. Let’s feel appreciated that our car manufactures embrace the concept of orthogonalization.
4 crucial tuning dimensions that you have to look for to achieve a good ML system are
training set
well on cost function
dev set
well on cost function
test set
well on cost function
Early stopping is one example of trying to control both dimensions “training set” and “dev set” at the same time
Key takeaway
Pick a single evaluation metric, which can be from taking average of candidate metrics or some kind of combining metrics (e.g. F1 scores, ROC)
Having a well-defined dev set and a single evaluation metric can speed up iterative process of improving ML algorithms.
But there’re so many evaluation metrics and if your models need to have more than 1 metric to indicate its performance, you can try to find a way to combine them.
For instance, for image classification, it’s a good practice to consider both Precision and Recall, but if the results come out as the table below, how do you know which classifier is better? The best way is looking for a combining metric of your original evaluation metrics. In this case, it would be F1 score
Classifier | Precision | Recall | F1 score |
---|---|---|---|
A | 95% | 90% | 92.4% |
B | 98% | 85% | 91.0% |
Sometimes, you need to evaluate your dev set on some important categorical variables such as geography. For instance, if you develop an app of cat pictures for cat lovers, you might want to know the error rate of pictures submitted by users from different geographies. However, this makes it challenging to compare algorithms. The best solution for this is keeping track of the average error rate of all geographies.
Classifier | US | China | India | Other | Average |
---|---|---|---|---|---|
A | 3% | 7% | 5% | 9% | 6% |
B | 5% | 6% | 5% | 10% | 6.5% |