SEARCH
You are in browse mode. You must login to use MEMORY

   Log in to start


From course:

Intro to AI 2

» Start this Course
(Practice similar questions for free)
Question:

Transformer - Add & Norm

Author: Christian N



Answer:

Layer Normalization = Output of the previous layer (from attention block) + Input Embedding (From the first step) Benefits - Faster training, Reduce Bias, Prevent weight explosion Types of Normalization - Batch & Layer normalization *Layer normalization is preferable for transformers, especially for Natural language processing tasks


0 / 5  (0 ratings)

1 answer(s) in total