Seven Life Lessons from Machine Learning


I spend a lot of time thinking about machine learning. And I give a lot of thought to life. When the channels collide, I discover that certain machine learning lessons can be applied to everyday life.

Here are seven of them. Despite the fact that I think most readers are familiar with machine learning concepts, I start each lesson with a quick description.

Cleaning up your data: Take a look at what you’re consuming.
So that our downstream analysis or machine learning is right, we clean data. Data cleaning isn’t grunted work; it’s the work, as Randy Au points out.

We don’t use data until we’ve explored and cleaned it first. Similarly, we should evaluate and filter all of life’s inputs before consuming them.

Take, for example, food. How much do we opt for something that is readily available and simple to prepare? I used to eat a bowl of Sugary-Os cereal every day until a few years ago. I’m more selective and pay closer attention to nutritional material now that I’m more mindful of my family’s diabetes background. I still have to make a deliberate effort to eat healthy and avoid junk food as I get older and my metabolism slows.

The same can be said for the material. Knowledge is ranked by news outlets and social media based on virality and advertisement dollars. Faster circulation of “empty calorie info-bites” that are easy to eat but do not enrich us. There is a lot of misinformation out there. Some content is offensive and even toxic and attempts to communicate don’t always go well. Only filter it out for the sake of sanity. Curate your news outlets and social media followers.
Relationships are a final example. We all know people who do more harm than good in our lives. They are either a distraction or a deterrent to our constructive habits. Despite our best attempts to make it work out, some people speak behind our backs and play manipulative games. Letting go of them allows for the growth of more fruitful partnerships.

Data from low vs. high signal levels should be disproved and updated.
We also want to find data that changes our decision boundary, in addition to filtering out the noise.

We begin with two easily distinguishable clusters and a linear decision boundary with a large margin in the left image above. (The support vectors are the circled points — one blue, two red.) With a single new data point, the decision boundary shifts dramatically and the margin shrinks in the centre picture. On the right, we discover that a non-linear decision boundary (and a soft margin) works better as we collect more data.
We should actively pursue data that updates our beliefs, theories, and decision-making boundaries in our daily lives. It’s kind of the polar opposite of confirmation bias.

True ignorance isn’t a lack of understanding, but rather a failure to learn. — Popper, Karl
Take, for example, suggestions. Positive feedback is like putting more data on the right side of the decision boundary — it’s always appreciated, yet it doesn’t help much with progress. I’m more interested in negative reviews so that I can improve my life algorithm.

Balance Explore-Exploit for greater long-term reward

We face the exploration-exploitation trade-off in reinforcement learning. We can either explore for more information (e.g., transition probabilities, incentives) that could lead to better future decisions and rewards, or we can manipulate and make the best decision for an optimal reward now, based on current information.

We’re looking for the right option as soon as possible. However, committing to a solution too quickly and without enough experimentation can lead to local optima. Exploration without a plan, on the other hand, is futile. It’s difficult to strike the right balance.

It’s the same old story. I decided half a year ago that the pork belly bento from an Asian supermarket was the cheapest and tastiest dinner choice within walking distance ($7 after 5pm). How do I know it’s still the highest utility if I just tried a few places and haven’t tried other choices recently? (By the way, if you know of any cheap, fuss-free takeout options in downtown Seattle, please let me know.)

Let’s talk about jobs on a more serious note. It’s perfectly fine if you’ve just graduated and haven’t decided what you want to do with your life. Don’t be too harsh on yourself if you didn’t drop out of college to start a multibillion-dollar business. Take the time to investigate different career options and determine which one is right for you. Don’t choose a career only because your father/society/LinkedIn told you to.

And once you’ve figured it out, go all in. Matthew McConaughey decided to go to film school instead of law school, but he wasn’t sure how his father would react. Instead of disapproval, he received three of the most important words ever: “Don’t half-assess it.”

Commit to see it through once we’ve made the transition from discovery to extraction. It’s a waste of time to spend so much time on the side exploring. Nonetheless, look up from time to time to see if there is a better option. Finding the right balance, as with most items, is crucial.
(It’s important to note that not everyone needs to be thoroughly investigated.) The majority of decisions are two-way. It’s also fine to choose “nice enough” when you’re short on time and energy.)

Books and articles are cheat codes for transfer learning.
Pre-training is the phase in deep learning where we train our models on a separate, typically larger dataset before applying transfer learning (also known as fine-tuning) to our particular problem and data.

For computer vision tasks, we use models that have been pre-trained on ImageNet. Recent models for language include an unsupervised pre-training stage. We use transfer learning to adapt pre-trained models for our particular problems by adding our own final layers, providing task-specific inputs and labels, and fine-tuning model weights.

Similarly, we can consider school to be a generalised type of pre-training. We’ve been pre-trained in general theory, method, and subject knowledge (math, science, humanities, etc). However, after we graduate (or even before), we must fine-tune our skills for unique tasks such as developing apps, launching companies, and recruiting teams. The argument is that school is really just a kind of pre-training; don’t think of graduation as the end of your education.

Transfer learning is a form of machine learning cheat code. It also works in real life; I’ve noticed that books are by far the best pre-trained models. The weights and prejudices of the great thinkers before us are represented in books. Books condense a lifetime’s worth of knowledge into a few hundred pages in an easily digestible format. Papers are another excellent learning resource. Years of study, experimentation, and learning have been distilled into a single paper. Those involving ablation studies are particularly interesting to me.

Is it any surprise that the most popular and knowledgeable people are avid readers?
Mark Twain once said, “The man who can’t read has no advantage over the man who can’t read.”
Iterations: Find reps that you can handle and iterate quickly.

Iteration is used in many machine learning techniques. Gradient boosted trees grow new trees iteratively based on pseudo-residuals (i.e., remaining error) from previous trees. Gradient descent is an iterative optimization technique for determining the lowest error. Epochs are used to train deep learning models iteratively, with each epoch passing the entire data set through the network.

Iteration is also a part of life. We won’t be able to completely comprehend a document if we just read it once. (Normally, I need three passes.) We will most likely not be able to beat the baseline with the first machine learning model we train; we will need to iterate and try different data, features, objective functions, parameters, and so on. Our first A/B test would most likely fail (but). Don’t be concerned if you don’t get it right the first time; after all, who does? We’ll continue to evolve and strengthen as long as we iterate.

In a similar way, don’t expect to achieve success overnight. Angry Birds’ creators tried 51 times before discovering a game that worked. Sir James Dyson tried 5,126 times over the course of 15 years before finally getting his vacuum cleaner to work. If you plan to try something, make sure you can handle the failures and iterations.
Day after day after day, I get up early and stay late. To become an instant star, it took me 17 years and 114 days. Lionel Messi is the best player in the world.

Aside from the number of iterations, the speed at which we iterate is also important. We can iterate more quickly by automating our machine learning experimentation workflow. Early launch — and consumer reviews — allows us to improve more quickly. We actually launched too late if we’re not embarrassed with version 1. Even Apple, the perfectionist, did not use copy-paste when launching the iPhone.

The rate of iteration is the most important indicator of performance for a young startup. — Sam Altman

Overfitting: Pay attention to your instincts and keep learning.
When our machine learning models are overfitted to the training data, they are unable to generalize to new data. As a result, despite the low training error, a validation error is high. We avoid overfitting by testing our models on a previously unseen holdout package.

Similarly, we should avoid overfitting when studying by concentrating on comprehension and intuition. While memorising problems and answers can bring us a long way in school, it won’t support us when we’re confronted with fresh, real-life issues.

I’m not sure what’s wrong with people: they don’t learn by comprehension, but rather by rote or some other method. Their information is in such a precarious state! Feynman, Richard P.

Our models can’t generalise to new data if they’re trained on old data. Likewise, if we stop learning, we will be unable to adapt as technology advances. Consider attempting machine learning in Excel; it’s possible (I believe), but Python makes it much simpler and more efficient.

How do we avoid overfitting in our daily lives? For me, adopting a beginner’s mindset fits well. It’s a Zen Buddhist philosophy in which we approach life with an open mind, eagerness, and no preconceptions. Even if it doesn’t suit our paradigm, we think like a beginner, remain interested, and approach new ideas as a student. We’re still learning and upgrading our algorithm with a beginner’s eye, reducing the risk of overfitting in existence.
Those who cannot understand, unlearn, and relearn would be the illiterates of the twenty-first century, not those who cannot read and write. Alvin Toffler (Alvin Toffler) (Alvin Toffler)

Assembling: Diversity is a source of power.
Ensembling is a machine learning technique that involves combining several machine learning models to produce better results than any single model. This can also be seen in random woods, where a variety of trees are planted and their projections are combined. In essence, each model compensates for the flaws of the others.

Diversity is power in life, too. I prefer diverse demographics, educations, experiences, and skills when forming teams; diverse viewpoints help us better connect to and support customers. We need skills in data pipelines, machine learning, software engineering, DevOps, product, and more to develop machine learning systems — it’s not enough to be good at one thing. The strengths of one person balance out the weaknesses of another.

Differences, not similarities, are the source of power, according to Stephen R. Covey.
When we transcend groupthink and merge different ideas, we get some of the best ideas. Or when we combine our knowledge of different subjects to create a new superpower. Dilbert, for example, was created by Scott Adams, who combined his ability to draw, sense of humor, and business knowledge.

Connecting things is what creativity is all about. When you ask creative people how they did that, they always feel a little bad because they didn’t really do it; they simply observed it. After a while, it seemed clear to them. That’s because they were able to relate previous experiences and synthesise new information.

And the reason they were able to do so was that they had more opportunities or had thought about their experiences more than others.

Unfortunately, it is an exceedingly uncommon product. Many people in our business haven’t had a wide range of experiences. As a result, they don’t have enough dots to connect, and their solutions are very linear and lack a broad perspective on the issue. — Apple CEO Steve Jobs

What other similarities do you see between machine learning and life? Reply to this tweet or leave a comment in the section below.