Thoughts and Theory

Photo Credit: Chase Baker

The gold standard for machine learning research is the ‘sequential’ model of experimentation: you have a baseline, your experiment and a fixed, predetermined test set. You evaluate your baseline on the test set, get a baseline figure. You then run your experiment on the test set, get another figure. Then you compare the two. Assuming that you publish all these artifacts, anyone can presumably reproduce the results. This is good science.

Reproducibility has always been the cornerstone of scientific progress and the subject of numerous workshops and calls to action in machine learning specifically. While attempts at improving the reproducibility…

Fairness and Bias

Robot — Photo by Possessed Photography on Unsplash

Have you ever tried to change someone’s mind? Of course you have. After all, it’s fundamentally what much of communication is about. It’s also incredibly difficult.

It’s even harder when it comes to the deep-seated political, religious, or social beliefs that have shaped us, because they tend to be tied to our own identity. Many beliefs are even completely unconscious and can only be elicited through indirect means. There is no telling if the job application I’m handing out will be treated differently because of my race or gender since the person I hand it to may not even be…

Heart in a bubble. Credit: Rawpixel

Karl Popper is probably the most underappreciated philosopher of the modern era. His writings provide a lens under which to examine many of the key social issues of today, from fake news, the anti-science movement, all the way to the controversies around power, religion, race, and gender that are gripping our world.

His most famous work of course is about the philosophy of science. While I have no hope to do justice to the breadth of his commentary on the topic, his key argument is both simple and exceedingly important:

You can only ever disprove a theory.

You can never…

The ghost in the Gaussian. Drawing from the author.

Standards for reporting performance metrics in machine learning are, alas, not often discussed. Since there doesn’t appear to be an explicit, widely shared agreement on the topic, I thought it might be interesting to offer the standard that I have been advocating, and which I attempt to follow as much as possible. It derives from this simple premise, which was drilled into me by my science teachers since middle school:

A general rule for scientific reporting is that every digit you write down ought to be ‘true,’ for whichever definition of ‘true’ is applicable.

Let’s examine what this means for…

Credit: RawPixel

In honor of the imminent release of ‘Software Engineering at Google’, which I highly recommend, I thought I’d relate a tale of how software evolution and feature creep can go wrong in ways which, while feeling great at every step of the journey, yield a net outcome a decade later that is a disaster with a surprisingly large blast radius. This is my modest parallel to Tony Hoare’s ‘billion dollar mistake’, or Dennis Richie’s ‘most expensive one-byte mistake’ and, coincidentally, it involves both pointers and zero-terminated data-structures.

It all begins innocently enough. The year is 2009. A colleague of mine…

The field of Machine Learning research has derived much of its dynamism from the pursuit of one ultimate goal: the quest for Artificial General Intelligence (AGI). This long-horizon objective fuels people’s passions by managing to always feel just barely out of reach, yet paradoxically ungraspable for deeply fundamental reasons. The possibility that someone may create a completely new form of sentience has led some to elevate the quest for AGI to a quasi-religious purpose — I was recently told un-ironically ‘I am willing to give up my life to solve AGI.’ …

The Game of Authors
Source: Google Arts and Culture

Credit assignment and decisions about paper authorship are a surprisingly difficult topic to navigate, particularly for junior researchers. I distinctly recall working really hard on my first paper, reflexively listing myself as last author — given my initials, I’m accustomed to being last on lists of names — and discovering through the oblique comments of my co-authors that name ordering on papers was actually a thing. That notion seemed rather comical to me at the time, but I quickly learned that people, bizarrely, really cared about it.

General Principles

Most professional organizations have a standard for authorship (for instance ACM, IEEE, APA

Photo by Jamie Street

Collaborations are at the heart of what we do as researchers, and success in one’s field has often equally to do with forging effective collaborations as coming up with innovative ideas. Yet collaborations are such fragile things, it’s sometimes a wonder that they can be sustained at all in a scientific environment often so ripe for petty conflict and competition.

Nothing is more suspicious than a single-author paper.

This may seem surprising to those accustomed to the ‘lone genius’ narrative that the press loves to perpetuate in science, but to the experienced researcher, single-author papers tend to raise a number…

While I was in London, taking part in a conversation on how technology leaders can help meet the UN Sustainable Development Goals, a very interesting workshop was being held in Long Beach: ‘Climate Change: How Can AI Help?’ I wish I had been able to attend. Headlining the workshop was a very detailed review paper on the topic, titled Tackling Climate Change with Machine Learning, which I highly recommend reading. …

One of the most familiar settings for a machine learning engineer is having access to a lot of data, but modest resources to annotate it. Everyone in that predicament eventually goes through the logical steps of asking themselves what to do when they have limited supervised data, but lots of unlabeled data, and the literature appears to have a ready answer: semi-supervised learning.

And that’s usually when things go wrong.

Historically, semi-supervised learning has been one of those rabbit holes that every engineer goes through as a rite of passage only to discover a newfound appreciation for plain old data…

Vincent Vanhoucke

I am a Principal Scientist at Google, working on Machine Learning and Robotics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store