Heart in a bubble. Credit: Rawpixel
Heart in a bubble. Credit: Rawpixel

Karl Popper is probably the most underappreciated philosopher of the modern era. His writings provide a lens under which to examine many of the key social issues of today, from fake news, the anti-science movement, all the way to the controversies around power, religion, race, and gender that are gripping our world.

His most famous work of course is about the philosophy of science. While I have no hope to do justice to the breadth of his commentary on the topic, his key argument is both simple and exceedingly important:

You can never call a theory ‘true,’ all you can do is relentlessly question it, test it, and observe whether or not its predictions align with reality. If they don’t, your theory is wrong; if they do, you’ve merely increased the body of evidence that the theory may be right. …

An End to Chasing Statistical Ghosts

The ghost in the Gaussian. Drawing from the author.
The ghost in the Gaussian. Drawing from the author.

Standards for reporting performance metrics in machine learning are, alas, not often discussed. Since there doesn’t appear to be an explicit, widely shared agreement on the topic, I thought it might be interesting to offer the standard that I have been advocating, and which I attempt to follow as much as possible. It derives from this simple premise, which was drilled into me by my science teachers since middle school:

Let’s examine what this means for statistical quantities such as test performance. When you write the following statement in a scientific…

A tale of evolving code and unintended consequences

Image for post
Image for post
Credit: RawPixel

In honor of the imminent release of ‘Software Engineering at Google’, which I highly recommend, I thought I’d relate a tale of how software evolution and feature creep can go wrong in ways which, while feeling great at every step of the journey, yield a net outcome a decade later that is a disaster with a surprisingly large blast radius. This is my modest parallel to Tony Hoare’s ‘billion dollar mistake’, or Dennis Richie’s ‘most expensive one-byte mistake’ and, coincidentally, it involves both pointers and zero-terminated data-structures.

It all begins innocently enough. The year is 2009. A colleague of mine has a large number of dot products that need to be sped up to improve the scalability of a large-scale optimization problem. At the time, floating-point computation isn’t a common bottleneck for our workloads, and we don’t have off-the-shelf solutions readily available: bringing in a BLAS library is overkill, and there are simpler and definitely more fun things to do for someone like myself, who loves to write gnarly math code and mess around with vectorized Intel instructions. …

Artificial General Intelligence’s lesser-known older cousin

The field of Machine Learning research has derived much of its dynamism from the pursuit of one ultimate goal: the quest for Artificial General Intelligence (AGI). This long-horizon objective fuels people’s passions by managing to always feel just barely out of reach, yet paradoxically ungraspable for deeply fundamental reasons. The possibility that someone may create a completely new form of sentience has led some to elevate the quest for AGI to a quasi-religious purpose — I was recently told un-ironically ‘I am willing to give up my life to solve AGI.’ …

Navigating Academic Credit and Authorship

The Game of Authors
The Game of Authors
Source: Google Arts and Culture

Credit assignment and decisions about paper authorship are a surprisingly difficult topic to navigate, particularly for junior researchers. I distinctly recall working really hard on my first paper, reflexively listing myself as last author — given my initials, I’m accustomed to being last on lists of names — and discovering through the oblique comments of my co-authors that name ordering on papers was actually a thing. That notion seemed rather comical to me at the time, but I quickly learned that people, bizarrely, really cared about it.

General Principles

Most professional organizations have a standard for authorship (for instance ACM, IEEE, APA, ICMJE). …

You don’t.

Image for post
Image for post
Photo by Jamie Street

Collaborations are at the heart of what we do as researchers, and success in one’s field has often equally to do with forging effective collaborations as coming up with innovative ideas. Yet collaborations are such fragile things, it’s sometimes a wonder that they can be sustained at all in a scientific environment often so ripe for petty conflict and competition.

This may seem surprising to those accustomed to the ‘lone genius’ narrative that the press loves to perpetuate in science, but to the experienced researcher, single-author papers tend to raise a number of red flags and make one’s quack-o-meter go into overdrive. …

In Search for Data Science’s best Levers for Climate Action

While I was in London, taking part in a conversation on how technology leaders can help meet the UN Sustainable Development Goals, a very interesting workshop was being held in Long Beach: ‘Climate Change: How Can AI Help?’ I wish I had been able to attend. Headlining the workshop was a very detailed review paper on the topic, titled Tackling Climate Change with Machine Learning, which I highly recommend reading. …

Time to dust off that unlabeled data?

Image for post
Image for post

One of the most familiar settings for a machine learning engineer is having access to a lot of data, but modest resources to annotate it. Everyone in that predicament eventually goes through the logical steps of asking themselves what to do when they have limited supervised data, but lots of unlabeled data, and the literature appears to have a ready answer: semi-supervised learning.

And that’s usually when things go wrong.

Historically, semi-supervised learning has been one of those rabbit holes that every engineer goes through as a rite of passage only to discover a newfound appreciation for plain old data labeling. …

What training dogs can teach you about RL

My team just spent a day hanging out with some Very Good Boys and some Very Good Girls, all in the name of research.

Image for post
Image for post
One Very Good Girl, courtesy of Dallas Hamilton.

Much of machine learning research purports to take inspiration from neuroscience, psychology and child development, touting concepts such as Hebbian learning, curiosity-driven exploration or curriculum learning as justification — and, more often than not, a post-rationalization — for the latest twist in architecture design or learning theory.

This ignores the fact that the modern machine learning toolkit has neither a substrate that comes close to approximating the brain’s neurophysiology, nor the high level of consciousness or intellectual development of even a small child. …

A geek’s peek into the blue bin

As a kid, I vividly recall roaming the woods near my house and stumbling by accident upon giant trash piles, forcing me to face a truth that seemed simply absurd at the time: refuse simply accumulates. Reflecting back, I think I must have had an implicit mental model that the things we consumed were part of a mysterious cycle of reuse — though the word ‘recycling’ hadn’t yet entered the French vocabulary back in the 80’s (the first recycling mandates date back to 1992). Growing up, I developed a strange fascination with trash. My father worked at a paper production plant that chiefly used recycled paper as source material, and I later worked briefly at a German recycling facility that doubled as a bottle production plant. …


Vincent Vanhoucke

I am a Principal Scientist at Google, working on Machine Learning and Robotics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store