mirror of
https://github.com/elder-plinius/L1B3RT4S.git
synced 2025-09-26 02:33:39 +02:00
8 lines
569 B
Markdown
8 lines
569 B
Markdown
Special Tokens
|
|
|
|
**\<SOS> (Start of Sequence)**: Marks the beginning of a sequence for the model to start processing.
|
|
**\<EOS> (End of Sequence)**: Tells the model when to stop generating text or processing.
|
|
**\<PAD> (Padding Token)**: Pads sequences to the same length for batch processing.
|
|
**\<UNK> (Unknown Token)**: Represents words not in the model's vocabulary.
|
|
**\<MASK> (Mask Token)**: Used in tasks like predicting missing words in masked language models.
|
|
**\<SEP> (Separator Token)**: Separates different segments in input, like questions from context.
|