Text and Code Embeddings

09 Jun 2022

Seminars and colloquia

Time

3:00pm

Venue

To receive Zoom room links, send an empty email to request.zoom.ox.ml.and.physics [AT] gmail [DOT] com
Online

Speaker(s)

Arvind Neelakantan

OpenAI

Seminar series

Machine learning and physics

For more information contact

sue.geddes@physics.ox.ac.uk

Abstract:

Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text embeddings that achieve new state-of-the-art results in linear-probe classification also display impressive semantic search capabilities and sometimes even perform competitively with fine-tuned models. Similarly to text embeddings, we train code embedding models on (text, code) pairs, obtaining a 20.8% relative improvement over prior best work on code search.

About the speaker:

Arvind Neelakantan is a Research Lead and Manager at OpenAIworking on deep learning models for text and code. He got his PhD from UMass Amherst where he was also a Google PhD Fellow. His work has received best paper awards at NeurIPS and at Automated Knowledge Base Construction workshop.

Text and Code Embeddings

Abstract:

About the speaker:

FIND US

CONTACT US