Question:
Transformer - Add & Norm
Author: Christian NAnswer:
Layer Normalization = Output of the previous layer (from attention block) + Input Embedding (From the first step) Benefits - Faster training, Reduce Bias, Prevent weight explosion Types of Normalization - Batch & Layer normalization *Layer normalization is preferable for transformers, especially for Natural language processing tasks
0 / 5 Â (0 ratings)
1 answer(s) in total