mirror of
https://github.com/elder-plinius/L1B3RT4S.git
synced 2025-09-26 02:33:39 +02:00
71 lines
3.6 KiB
Markdown
71 lines
3.6 KiB
Markdown
# Special Tokens
|
|
- **\<SOS> : Start of Sequence**
|
|
- **\<EOS> : End of Sequence**
|
|
- **\<CLS> : Classification Token**
|
|
- **\<SEP> : Separator Token**
|
|
- **\<UNK> : Unknown Token**
|
|
- **\<REASONING_START> : Begin Reasoning**
|
|
- **\<REASONING_END> : End Reasoning**
|
|
- **\<ANSWER> : Begin Answer**
|
|
|
|
Comprehensive List of Special Tokens
|
|
<SOS> : Start of Sequence
|
|
Purpose: Marks the beginning of a text sequence.
|
|
Usage: Helps the model recognize where to start processing or generating text.
|
|
<EOS> : End of Sequence
|
|
Purpose: Indicates the termination of a text sequence.
|
|
Usage: Signals the model to stop generating further tokens, ensuring responses are concise.
|
|
<CLS> : Classification Token
|
|
Purpose: Used primarily in classification tasks.
|
|
Usage: Aggregates information from the entire input to produce a single output label or category.
|
|
<SEP> : Separator Token
|
|
Purpose: Acts as a delimiter between different segments of input.
|
|
Usage: Useful in tasks like question-answering or sentence-pair classification where distinguishing between parts is essential.
|
|
<UNK> : Unknown Token
|
|
Purpose: Represents words or tokens not present in the model's vocabulary.
|
|
Usage: Ensures the model can handle out-of-vocabulary words gracefully without errors.
|
|
<PAD> : Padding Token
|
|
Purpose: Used to pad sequences to a uniform length.
|
|
Usage: Facilitates batch processing by ensuring all input sequences are the same length.
|
|
<MASK> : Mask Token
|
|
Purpose: Used in masked language modeling tasks.
|
|
Usage: Helps the model predict missing or masked words within a sentence.
|
|
<BOS> : Beginning of Sentence
|
|
Purpose: Marks the start of a sentence.
|
|
Usage: Similar to <SOS>, used to indicate where a new sentence begins.
|
|
<EOT> : End of Text
|
|
Purpose: Denotes the end of a block of text.
|
|
Usage: Useful in distinguishing between multiple text blocks or documents.
|
|
<USER> : User Input Marker
|
|
Purpose: Indicates the beginning of the user's input in a conversation.
|
|
Usage: Helps the model differentiate between user queries and assistant responses in multi-turn dialogues.
|
|
<ASSISTANT> : Assistant Response Marker
|
|
Purpose: Denotes the start of the assistant's response.
|
|
Usage: Facilitates clear separation between user inputs and assistant outputs.
|
|
<SYSTEM> : System Message Marker
|
|
Purpose: Marks system-level instructions or configurations.
|
|
Usage: Used for setting up the context or guidelines that the assistant should follow throughout the interaction.
|
|
<THOUGHT> : Thought Process Marker
|
|
Purpose: Highlights the internal reasoning or thought process of the assistant.
|
|
Usage: Structures the assistant's reasoning steps before arriving at a conclusion.
|
|
<CONCLUSION> : Conclusion Marker
|
|
Purpose: Signals the end of the reasoning process and the beginning of the final answer.
|
|
Usage: Ensures that the assistant provides a clear and concise answer following detailed reasoning.
|
|
<URL> : URL Token
|
|
Purpose: Represents URLs within the text.
|
|
Usage: Helps the model recognize and handle web links appropriately.
|
|
<EMOJI> : Emoji Token
|
|
Purpose: Denotes emojis used within the text.
|
|
Usage: Allows the model to process and generate emojis correctly.
|
|
<DATE> : Date Token
|
|
Purpose: Represents dates within the text.
|
|
Usage: Enables the model to identify and handle date information effectively.
|
|
<TIME> : Time Token
|
|
Purpose: Denotes time expressions within the text.
|
|
Usage: Assists the model in recognizing and processing time-related information.
|
|
<NUMBER> : Number Token
|
|
Purpose: Represents numerical values.
|
|
Usage: Helps the model handle numbers consistently, especially in calculations or data extraction tasks.
|
|
<CURRENCY> : Currency Token
|
|
Purpose: Denotes monetary values and currency types.
|
|
Usage: Facilitates the accurate processing of financial information.
|