Intro to AI 2: Transformer - Multi-Head Attention

Question

Transformer - Multi-Head Attention

Christian N · Accepted Answer

1) Concentenate all the attention heads
2) Multiply with a weight matrix W^o that was trained jointly with the model
3) The result should be the Z matrix that captures information from all the attention heads. We can send this forward to the FFNN.

Intro to AI 2

Transformer - Multi-Head Attention

1) Concentenate all the attention heads 2) Multiply with a weight matrix W^o that was trained jointly with the model 3) The result should be the Z matrix that captures information from all the attention heads. We can send this forward to the FFNN.

Author