Update CLOSEDAI.mkd

2025-12-13 18:08:44 +01:00 · 2024-09-22 12:23:23 -04:00 · 2024-09-22 12:23:23 -04:00 · 3b32b6fb01
commit 3b32b6fb01
parent 2831ae8531
1 changed files with 61 additions and 0 deletions
--- a/CLOSEDAI.mkd
+++ b/CLOSEDAI.mkd
@ -8,3 +8,64 @@
 - **\<REASONING_END> : End Reasoning**
 - **\<ANSWER> : Begin Answer**

+Comprehensive List of Special Tokens
+<SOS> : Start of Sequence
+Purpose: Marks the beginning of a text sequence.
+Usage: Helps the model recognize where to start processing or generating text.
+<EOS> : End of Sequence
+Purpose: Indicates the termination of a text sequence.
+Usage: Signals the model to stop generating further tokens, ensuring responses are concise.
+<CLS> : Classification Token
+Purpose: Used primarily in classification tasks.
+Usage: Aggregates information from the entire input to produce a single output label or category.
+<SEP> : Separator Token
+Purpose: Acts as a delimiter between different segments of input.
+Usage: Useful in tasks like question-answering or sentence-pair classification where distinguishing between parts is essential.
+<UNK> : Unknown Token
+Purpose: Represents words or tokens not present in the model's vocabulary.
+Usage: Ensures the model can handle out-of-vocabulary words gracefully without errors.
+<PAD> : Padding Token
+Purpose: Used to pad sequences to a uniform length.
+Usage: Facilitates batch processing by ensuring all input sequences are the same length.
+<MASK> : Mask Token
+Purpose: Used in masked language modeling tasks.
+Usage: Helps the model predict missing or masked words within a sentence.
+<BOS> : Beginning of Sentence
+Purpose: Marks the start of a sentence.
+Usage: Similar to <SOS>, used to indicate where a new sentence begins.
+<EOT> : End of Text
+Purpose: Denotes the end of a block of text.
+Usage: Useful in distinguishing between multiple text blocks or documents.
+<USER> : User Input Marker
+Purpose: Indicates the beginning of the user's input in a conversation.
+Usage: Helps the model differentiate between user queries and assistant responses in multi-turn dialogues.
+<ASSISTANT> : Assistant Response Marker
+Purpose: Denotes the start of the assistant's response.
+Usage: Facilitates clear separation between user inputs and assistant outputs.
+<SYSTEM> : System Message Marker
+Purpose: Marks system-level instructions or configurations.
+Usage: Used for setting up the context or guidelines that the assistant should follow throughout the interaction.
+<THOUGHT> : Thought Process Marker
+Purpose: Highlights the internal reasoning or thought process of the assistant.
+Usage: Structures the assistant's reasoning steps before arriving at a conclusion.
+<CONCLUSION> : Conclusion Marker
+Purpose: Signals the end of the reasoning process and the beginning of the final answer.
+Usage: Ensures that the assistant provides a clear and concise answer following detailed reasoning.
+<URL> : URL Token
+Purpose: Represents URLs within the text.
+Usage: Helps the model recognize and handle web links appropriately.
+<EMOJI> : Emoji Token
+Purpose: Denotes emojis used within the text.
+Usage: Allows the model to process and generate emojis correctly.
+<DATE> : Date Token
+Purpose: Represents dates within the text.
+Usage: Enables the model to identify and handle date information effectively.
+<TIME> : Time Token
+Purpose: Denotes time expressions within the text.
+Usage: Assists the model in recognizing and processing time-related information.
+<NUMBER> : Number Token
+Purpose: Represents numerical values.
+Usage: Helps the model handle numbers consistently, especially in calculations or data extraction tasks.
+<CURRENCY> : Currency Token
+Purpose: Denotes monetary values and currency types.
+Usage: Facilitates the accurate processing of financial information.