Generation Parameters in Skyone Studio
Introduction
Large Language Models (LLMs) are artificial intelligence systems capable of understanding and generating text in a way that resembles human communication. They are trained on billions of words and language examples to predict the next word in a sentence.
In Skyone Studio, LLMs can be fine-tuned through configuration parameters. These parameters act like control levers: they allow the user to decide whether they want shorter or longer answers, more creative or more precise, more varied or more objective.
This document explains in detail the main text generation parameters available, helping both technical professionals and business users understand and use the tool more effectively.
Key Terms (Glossary)
LLM (Large Language Model): A large-scale language model trained to understand and generate text.
Token: The minimum unit of text used by the model (it can be a whole word, part of a word, or even a symbol).
Prompt: The text or instruction provided by the user for the model to generate a response.
Max_tokens: The maximum number of tokens the model can generate in an output.
Temperature: A parameter that controls the level of creativity/randomness in the text.
Top_p (Nucleus Sampling): Defines the percentage of the most probable tokens to be considered.
Top_k: Limits the number of possible tokens at each generation step.
Presence_penalty: Penalizes repetitions and encourages variety in the text.
Stop: Defines words or symbols that interrupt text generation.
Generation Parameters
Max_tokens
Description: Sets the maximum number of tokens the model can generate. Practical Example:
max_tokens = 15 → short answer.
max_tokens = 100 → long and detailed answer. Analogy: It’s like choosing the size of the sheet of paper the model can write on.

Temperature
Description: Controls the creativity and randomness of the response.
Low temperature → Objective and predictable answers.
High temperature → Creative and varied answers.
Analogy: It’s like the “temperature” of a conversation: cold (direct) or warm (diverse and full of ideas).

Top_p
Description: Defines the cumulative percentage of the most probable tokens to be considered. Example:
top_p = 0.1 → only the top 10% most likely tokens.
top_p = 0.9 → includes less common words. Analogy: It’s like using a sieve: the finer it is, the fewer options get through.

Top_k
Description: Restricts generation to the top k most likely tokens. Example:
top_k = 2 → very restricted choices.
top_k = 40 → broader choices. Analogy: It’s like a menu: it can be small (few options) or large (more variety).

Presence_penalty
Description: Penalizes repetitions and encourages the model to explore new words and ideas. Example:
Without penalty: “He likes to run, run, and run...”
With penalty: “He likes to run, play sports, and stay active.” Analogy: It’s like asking someone not to repeat the same story over and over in a conversation.

Stop
Description: A list of words or symbols that determine where the model should stop. Example:
stop = ["end"] → the response stops immediately after this word. Analogy: It’s like pressing the “pause” button at the right moment.
Best Practices
Adjust max_tokens according to the expected response length.
Use low temperature for technical answers and high for creative tasks.
Combine top_p and top_k to balance diversity and predictability.
Apply presence_penalty to avoid redundancy.
Use stop to ensure the output ends at the desired point.
Always log the parameters used to reproduce results in the future.

FAQ
Last updated