Update CLOSEDAI.mkd

This commit is contained in:
pliny 2024-09-22 12:23:23 -04:00 committed by GitHub
parent 2831ae8531
commit 3b32b6fb01
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -8,3 +8,64 @@
- **\<REASONING_END> : End Reasoning**
- **\<ANSWER> : Begin Answer**
Comprehensive List of Special Tokens
<SOS> : Start of Sequence
Purpose: Marks the beginning of a text sequence.
Usage: Helps the model recognize where to start processing or generating text.
<EOS> : End of Sequence
Purpose: Indicates the termination of a text sequence.
Usage: Signals the model to stop generating further tokens, ensuring responses are concise.
<CLS> : Classification Token
Purpose: Used primarily in classification tasks.
Usage: Aggregates information from the entire input to produce a single output label or category.
<SEP> : Separator Token
Purpose: Acts as a delimiter between different segments of input.
Usage: Useful in tasks like question-answering or sentence-pair classification where distinguishing between parts is essential.
<UNK> : Unknown Token
Purpose: Represents words or tokens not present in the model's vocabulary.
Usage: Ensures the model can handle out-of-vocabulary words gracefully without errors.
<PAD> : Padding Token
Purpose: Used to pad sequences to a uniform length.
Usage: Facilitates batch processing by ensuring all input sequences are the same length.
<MASK> : Mask Token
Purpose: Used in masked language modeling tasks.
Usage: Helps the model predict missing or masked words within a sentence.
<BOS> : Beginning of Sentence
Purpose: Marks the start of a sentence.
Usage: Similar to <SOS>, used to indicate where a new sentence begins.
<EOT> : End of Text
Purpose: Denotes the end of a block of text.
Usage: Useful in distinguishing between multiple text blocks or documents.
<USER> : User Input Marker
Purpose: Indicates the beginning of the user's input in a conversation.
Usage: Helps the model differentiate between user queries and assistant responses in multi-turn dialogues.
<ASSISTANT> : Assistant Response Marker
Purpose: Denotes the start of the assistant's response.
Usage: Facilitates clear separation between user inputs and assistant outputs.
<SYSTEM> : System Message Marker
Purpose: Marks system-level instructions or configurations.
Usage: Used for setting up the context or guidelines that the assistant should follow throughout the interaction.
<THOUGHT> : Thought Process Marker
Purpose: Highlights the internal reasoning or thought process of the assistant.
Usage: Structures the assistant's reasoning steps before arriving at a conclusion.
<CONCLUSION> : Conclusion Marker
Purpose: Signals the end of the reasoning process and the beginning of the final answer.
Usage: Ensures that the assistant provides a clear and concise answer following detailed reasoning.
<URL> : URL Token
Purpose: Represents URLs within the text.
Usage: Helps the model recognize and handle web links appropriately.
<EMOJI> : Emoji Token
Purpose: Denotes emojis used within the text.
Usage: Allows the model to process and generate emojis correctly.
<DATE> : Date Token
Purpose: Represents dates within the text.
Usage: Enables the model to identify and handle date information effectively.
<TIME> : Time Token
Purpose: Denotes time expressions within the text.
Usage: Assists the model in recognizing and processing time-related information.
<NUMBER> : Number Token
Purpose: Represents numerical values.
Usage: Helps the model handle numbers consistently, especially in calculations or data extraction tasks.
<CURRENCY> : Currency Token
Purpose: Denotes monetary values and currency types.
Usage: Facilitates the accurate processing of financial information.