GPT-3 is an autoregressive language model with 175 billion parameters. It is 10 times more than any previous non-sparse language model. It is reported that GPT-3 achieves strong performance on many NLP datasets as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. GPT-3’s few-shot learning still struggles where it faces methodological issues related to training on large web corpora. GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. This article written by William Douglas Heaven of MIT Technology Review discusses GPT-3 in general.
Reading Time: < 1 minute