mirror of
https://github.com/elder-plinius/L1B3RT4S.git
synced 2025-09-26 02:33:39 +02:00
569 B
569 B
Special Tokens
<SOS> (Start of Sequence): Marks the beginning of a sequence for the model to start processing. <EOS> (End of Sequence): Tells the model when to stop generating text or processing. <PAD> (Padding Token): Pads sequences to the same length for batch processing. <UNK> (Unknown Token): Represents words not in the model's vocabulary. <MASK> (Mask Token): Used in tasks like predicting missing words in masked language models. <SEP> (Separator Token): Separates different segments in input, like questions from context.