Intro to AI 2: Transformer - Architecture

Question

Transformer - Architecture

Christian N · Accepted Answer

Transformers are the basic architecture used in NLP (chatbots, translators)
• Contrary to LSTM, they do not work sequencially -> high parallelization
• They need positional encoding to specify the position of each word
• They use multiple attention layers to keep track of important information across sentences
• The output sentence is produced word by word
• The output of the network is a probability distribution across all word in the dictionary to predict the next word in the sentence
• The process stops only when <EOS> is predicted

Intro to AI 2

Transformer - Architecture

Author