BERT is not really an auto-encoder, in the sense that the prediction of non-masked words is ignored…

I’m trying to understand this claim, specifically the distinction between auto encoding and self…
20
1
Alex Rozinov
Vincent Vanhoucke
·Follow
1 min read·
Jun 23, 2019
--
BERT is not really an auto-encoder, in the sense that the prediction of non-masked words is ignored during training. That’s a very important distinction. It also uses an (non-generative) next-sentence prediction which is another form of self-supervision.
--
--
Written by Vincent Vanhoucke4.7K Followers
·54 Following
I am a Distinguished Engineer at Waymo, working on Machine Learning and Robotics. Previously head of robotics research at Google DeepMind.
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams