Vincent Vanhoucke
1 min readJun 23, 2019

--

BERT is not really an auto-encoder, in the sense that the prediction of non-masked words is ignored during training. That’s a very important distinction. It also uses an (non-generative) next-sentence prediction which is another form of self-supervision.

--

--

Vincent Vanhoucke

I am a Distinguished Engineer at Waymo, working on Machine Learning and Robotics. Previously head of robotics research at Google DeepMind.