Positional Encoding

Positional Encoding is a technique used in the Transformer model to capture the order or position of words in a sentence. This is critical because unlike Recurrent Neural Networks (RNNs), Transformers do not process data in sequence, hence they don't inherently capture the order of elements in an input sequence.

To overcome this, positional encodings are added to the input embeddings. These encodings are vectors that represent the position of a word within a sentence. The dimension of these encodings is the same as that of the embeddings so that they can be summed up.

The positional encoding for a position and dimension is calculated using sine and cosine functions as follows:

Where:

  • is the position of a word in a sentence.
  • and are dimensions in the embedding space.
  • is the total dimension of word embeddings.

These functions were chosen because they can provide unique positional encodings and can be easily learned by the model. The sinusoidal nature also allows the model to infer relative positions of words and generalize to different sequence lengths.