Question:
Transformer - Architecture
Author: Christian NAnswer:
Transformers are the basic architecture used in NLP (chatbots, translators) • Contrary to LSTM, they do not work sequencially -> high parallelization • They need positional encoding to specify the position of each word • They use multiple attention layers to keep track of important information across sentences • The output sentence is produced word by word • The output of the network is a probability distribution across all word in the dictionary to predict the next word in the sentence • The process stops only when <EOS> is predicted
0 / 5 (0 ratings)
1 answer(s) in total