Generation Parameters in Skyone Studio

Introduction

Large Language Models (LLMs) are artificial intelligence systems capable of understanding and generating text in a way that resembles human communication. They are trained on billions of words and language examples to predict the next word in a sentence.

In Skyone Studio, LLMs can be fine-tuned through configuration parameters. These parameters act like control levers: they allow the user to decide whether they want shorter or longer answers, more creative or more precise, more varied or more objective.

This document explains in detail the main text generation parameters available, helping both technical professionals and business users understand and use the tool more effectively.


Key Terms (Glossary)

  • LLM (Large Language Model): A large-scale language model trained to understand and generate text.

  • Token: The minimum unit of text used by the model (it can be a whole word, part of a word, or even a symbol).

  • Prompt: The text or instruction provided by the user for the model to generate a response.

  • Max_tokens: The maximum number of tokens the model can generate in an output.

  • Temperature: A parameter that controls the level of creativity/randomness in the text.

  • Top_p (Nucleus Sampling): Defines the percentage of the most probable tokens to be considered.

  • Top_k: Limits the number of possible tokens at each generation step.

  • Presence_penalty: Penalizes repetitions and encourages variety in the text.

  • Stop: Defines words or symbols that interrupt text generation.


Generation Parameters

Max_tokens

Description: Sets the maximum number of tokens the model can generate. Practical Example:

  • max_tokens = 15 → short answer.

  • max_tokens = 100 → long and detailed answer. Analogy: It’s like choosing the size of the sheet of paper the model can write on.


Temperature

Description: Controls the creativity and randomness of the response.

  • Low temperature → Objective and predictable answers.

  • High temperature → Creative and varied answers.

Analogy: It’s like the “temperature” of a conversation: cold (direct) or warm (diverse and full of ideas).


Top_p

Description: Defines the cumulative percentage of the most probable tokens to be considered. Example:

  • top_p = 0.1 → only the top 10% most likely tokens.

  • top_p = 0.9 → includes less common words. Analogy: It’s like using a sieve: the finer it is, the fewer options get through.


Top_k

Description: Restricts generation to the top k most likely tokens. Example:

  • top_k = 2 → very restricted choices.

  • top_k = 40 → broader choices. Analogy: It’s like a menu: it can be small (few options) or large (more variety).


Presence_penalty

Description: Penalizes repetitions and encourages the model to explore new words and ideas. Example:

  • Without penalty: “He likes to run, run, and run...”

  • With penalty: “He likes to run, play sports, and stay active.” Analogy: It’s like asking someone not to repeat the same story over and over in a conversation.


Stop

Description: A list of words or symbols that determine where the model should stop. Example:

  • stop = ["end"] → the response stops immediately after this word. Analogy: It’s like pressing the “pause” button at the right moment.


Best Practices

  • Adjust max_tokens according to the expected response length.

  • Use low temperature for technical answers and high for creative tasks.

  • Combine top_p and top_k to balance diversity and predictability.

  • Apply presence_penalty to avoid redundancy.

  • Use stop to ensure the output ends at the desired point.

  • Always log the parameters used to reproduce results in the future.


FAQ

What is a token?

A token is a piece of text, which can be a whole word, part of a word, or even a symbol.

What’s the difference between Top_p and Top_k?
  • Top_k sets a fixed number of possible words.

  • Top_p uses a cumulative probability percentage.

When should I use a high temperature?

For creative tasks such as brainstorming, story generation, or free drafting.

Can the presence_penalty cause problems?

Yes. If set too high, it can hurt coherence by forcing excessive variety.

Do I always need to define all parameters?

No. Many have default values that work well in most cases, but manually tuning them helps achieve more precise results.

Last updated