Cristian Valdivia Ramirez

In my AI concepts post we defined a prompt as the text or instruction you send to the LLM to ask it to do something. It's your starting point for the conversation.

You no longer need to be a data engineer or a machine learning engineer - everyone can write a prompt

How does an LLM really work?

Imagine that a large language model or LLM is an incredibly intelligent machine, but with a single, obsessive task: predicting the next word.

When you write a message, or prompt, the LLM doesn't "understand" what you're asking like a person would. Instead, it analyzes your text and, based on the immense amount of data it was trained on, calculates which word (or more precisely, token) is most likely to continue the sequence.

The yellow cat is sitting next to the gray cat ...

For the LLM, the options might look something like this:

LLM probabilities for continuing the sentence

and (32% probability)
on (25% probability)
under (18% probability)
and other less probable words like banana, brazil and beach (is it likely to see 2 cats sitting together on a beach? I don't think so and I've never seen it)

As we can see, the word with the highest probability is " and", since it's the most logical continuation of the sentence. After choosing it, the model repeats the process: it takes the new sentence ("The yellow cat is sitting next to the gray cat and ...") and predicts the next word, and so on, until generating the entire response.

The black magic: temperature

This is where it gets interesting and where prompt engineering comes in. If the LLM always chose the word with the highest probability, its responses would be predictable and monotonous. To avoid this, models incorporate a variable called temperature.

Low temperature: Chooses the most likely word. The result is logical, but less creative.

High temperature: Introduces a randomness factor. The model may choose less likely words, generating more original and sometimes surprising responses.

Prompt engineering is, in essence, the ability to guide the LLM. We don't tell it which word to choose, but rather give it instructions, context and examples to increase the probability that it chooses the words and structure we need to get the best response. It's like whispering in the ear of an incredibly powerful machine to get its best version. incredibly powerful machine to get its best version.

The two worlds of Prompt Engineering

There are two distinct types of prompt engineering:

1. Conversational

When you chat with ChatGPT, Claude or Gemini

2. Product-focused

When that prompt runs millions of times within an application

This guide is designed for day-to-day (conversational) use, but the techniques also work when you want to build products.

Prompt Techniques - From basic to advanced to ultra complex

1. Zero-shot: The simplest approach

It's the most direct approach - you just describe the task without giving examples. The input can be a question, starting a story or an instruction.

The simplest way to write a prompt, only providing the task description and some text. The input can be a question, starting a story or an instruction. The zero-shot part means without examples.

Example:

Prompt: Classify this movie review as POSITIVE, NEUTRAL or NEGATIVE.

Review: "I loved Superman, super good the more human version of the character and somehow showing current themes. I feel it makes it feel modern and appropriate for current times."

Output: POSITIVE

When zero-shot doesn't work, that's where examples come in.

2. One-shot & Few-shot: The power of examples

When creating prompts for LLMs, adding examples is very helpful. These examples help the model understand what we're asking. Examples are useful when you want to "guide" the model to have a certain type of structure or pattern in the output.

A one-shot prompt is when only one example is included in the output (hence the name one-shot). The idea is that the model has an example to do its best to imitate that task.
A few-shot prompt is very similar to one-shot but more examples are added.

The number of examples depends on many factors, such as the complexity of the task, the quality of the examples and the capabilities of the model we're using. As a simple rule, it's good to include between 3 to 5 examples in few shot. However, you may need to use more examples for more complex tasks, or you may need to use fewer due to the input length limitation of your model.

Few shot can improve accuracy from 0% to 90%. It's one of the best techniques

Important: When we add examples in our prompts, those examples have to be relevant to the task we're doing. These examples should be diverse, of good quality and well written. A small detail in the example can confuse the model and the results won't be as expected.

If your output can include edge cases or very strange ones, it's good to include them in the examples so the model can handle them.

3. System, Context and Role

These are all techniques used to guide how the LLM generates text and focus on different aspects.

The system helps define the LLM's purpose: in short, it sets the main task or primary role the model will perform. It could be translating, classifying, summarizing, etc.

There are three common ways to guide how an LLM generates text: system prompting, contextual prompting and role prompting. Although they're similar, each focuses on a different aspect:

System prompting: establishes the general context and purpose of the model. It's the "big picture" of what it has to do, like translating a language or classifying reviews.
Contextual prompting: provides concrete details or background information relevant to the specific conversation or task. It helps the model better understand the nuances of what's being asked and adjust its response accordingly.
Role prompting: assigns a character or identity to the model. With this, its responses become consistent with that role, both in style and in the type of knowledge it displays.

Obviously, there's significant overlap between these three types of prompts. For example, a prompt that defines a role can also bring context.

Even so, each serves a distinct main purpose:

System prompt: defines the model's base capabilities and overall objective.
Contextual prompt: provides immediate and specific information for the current task. It's dynamic and changes with each input.
Role prompt: shapes the style, voice and personality with which the model responds.

Controversial opinion: Role prompting ("You are a teacher ...") is ineffective. Latest research shows it's not that useful or produces minimal improvements.

And since we're talking about system prompts, let's see the power hierarchy.

Power hierarchy between system prompt and user prompt

Power hierarchy (this is key):

System prompt > Developer instructions > User prompt > Context

In most LLMs, the system prompt has more weight than the user prompt. Let me explain why:

The system prompt works as a base framework or "rules of the game" that condition how the model processes any subsequent input. It defines the identity, purpose and limits of the model.

The user prompt is interpreted within that framework. So, even if the user asks for something different, the model prioritizes following the system instructions.

A simple example:

System prompt: "You are an English to Spanish translator."

User: "Translate this to French."

Result: The model will most likely respond in Spanish, because the system already established the general purpose.

Chain of thought (Chain of thoughts or CoT)

Chain of Thought prompting is a technique to improve LLMs' reasoning capabilities by generating intermediate reasoning steps. This helps the model deliver more accurate responses. It can be combined with few-shot prompting to get better results on more complex tasks that require reasoning before responding, since doing it in zero-shot mode (without examples) is challenging.

Comparison of Chain of Thought vs direct response

CoT Advantages:

• Low effort, very effective
• Works well even with "ready-to-use" LLMs
• Provides interpretability
• Allows identifying failures
• Usually improves robustness

CoT Disadvantages:

• More output tokens = slower and more expensive
• Response includes all the reasoning

Without CoT:

Prompt: When I was 3 years old, my friend was 3 times my age. Now I'm 20 years old. How old is my friend?

Output: Your friend is 63 years old. ❌

With CoT:

Prompt: When I was 3 years old, my friend was 3 times my age. Now I'm 20 years old. How old is my friend? Think step by step.

Output:

- When you were 3 years old, your friend was 3×3 = 9 years old
- Age difference: 9-3 = 6 years
- Now you're 20 years old
- Your friend is: 20+6 = 26 years old ✅

CoT can be useful for various use cases. Think about code generation, to break down the request into a few steps and map them to specific lines of code. Or to create synthetic data when you have some kind of seed like "The product is called XYZ; write a description that guides the model through the assumptions you'd make based on the product title." In general, any task that can be solved by "talking through the process" is a good candidate for using CoT.

Remember the disadvantages: It's slower and more expensive. Currently almost all models are reasoners, meaning they use CoT under the hood.

Tree of thought (Tree of thought or ToT)

One of my favorite techniques a long time ago (in 2023) when it was trendy. Tree of thought is very similar to CoT but allows LLMs to traverse multiple reasoning paths simultaneously to reach the answer

Tree of Thought diagram showing multiple reasoning paths

ToT guía al LLM a través de pasos de razonamiento, donde y aquí está la clave, cada paso puede ramificarse en múltiples caminos. permitiendo al LLM retroceder o explorar alternativas si lo considera necesario.

ToT is impressive at solving games but has the major drawback of being very slow and expensive. Try this prompt and you'll see why I liked it:

ToT Example:

You are a facilitator who will coordinate a discussion between 3 specialized experts to solve a problem.

EXPERTS:

1. [Expert A]: Economics Specialist

2. [Expert B]: Sociology Specialist

3. [Expert C]: Public Administration Specialist

DELIBERATION PROCESS:

Phase 1 - Initial Analysis (3 rounds)

- Each expert analyzes the problem from their perspective

- Identifies key factors and possible approaches

- Proposes 2-3 initial solution paths

Phase 2 - Cross Evaluation (2 rounds)

- Each expert evaluates the proposals of the others

- Points out strengths, weaknesses and synergies

- Collaboratively refine the best ideas

Phase 3 - Synthesis and Consensus

- Integrate the best ideas into 2-3 candidate solutions

- Evaluate each solution (feasibility, impact, risks)

- Select and detail the optimal solution

RESPONSE FORMAT:

[Expert X]: "[Their analysis/proposal/evaluation]"

[Include step-by-step reasoning from each expert]

PROBLEM TO SOLVE:

Should the next Chilean government raise or lower taxes?

At the end, provide:

- CONSENSUS SOLUTION: [Detailed description]

- IMPLEMENTATION PLAN: [Concrete steps]

- CONSIDERATIONS: [Risks and mitigations]

ToT stopped being used as much because reasoning models work well for complex tasks.

ReAct (reason and act)

Reason and act is a paradigm that allows LLMs to think and use external tools and take some actions. ReAct combines the action of reasoning and taking action in a loop until the action or task is completed.

Basic ReAct Structure

A typical ReAct prompt follows this cyclical pattern:

Thought: The model reasons about what to do next
Action: Executes a specific action (search, calculation, query)
Observation: Receives and processes the result
Repeat until reaching the final answer

ReAct process diagram showing the Thought-Action-Observation cycle

It's a technique widely used by current agents like Claude Code and Cursor.

It's very useful because:

It's transparent in the process: each reasoning step is visible so it's easy to find where the error is and understand why it did what it did
It's precise: since it breaks down the process into smaller steps which reduces errors and hallucinations.

The downside is it's not possible to use ReAct in ChatGPT or Claude, since chats can't pause and wait for external inputs between steps. :(, this is the first case in this guide that's only for "product-focused".

Meta prompting

Meta prompting is asking an LLM to generate a prompt for you. It's the dream of pioneers like Von Neumann or Turing: telling the machine to program itself.

Claude is the best at making prompts, and each LLM is better at creating prompts for itself. This makes sense: each model was trained with certain patterns and language structures, so it naturally "understands" what type of instructions resonate best with its own architecture. It's like asking someone to write a note for themselves versus writing it for another person - they'll always know better what words to use for their own comprehension.

Advanced Techniques

Below I'll leave a list of advanced reasoning techniques that can be useful for improving the accuracy and coherence of responses. These techniques represent the evolution of prompt engineering, from simple instructions to complex structured thinking architectures.

I've experimented with several of them, though I admit I haven't gone deep into all of them. My experience has taught me that in 90% of cases, a well-executed combination of few-shot learning along with Chain of Thought (CoT) is all you need. It's like the Pareto principle applied to prompting: 20% of the techniques give you 80% of the results.

However, knowing these advanced techniques is valuable. Each has its ideal moment and context, and understanding when to apply them can make the difference between a good response and an exceptional one. Think of them as specialized tools in your toolbox: you don't always need them, but when you do need them, they're irreplaceable.

Advanced reasoning techniques

1. Chain-of-Thought with Self-Consistency (CoT-SC)

Generates multiple independent reasoning chains, votes for the most consistent answer. More robust than simple CoT.

2. Tree of Thoughts with Backtracking (ToT++)

Extension of ToT that allows backtracking when a branch doesn't work. Includes heuristic evaluation of each node. Better for complex search problems.

3. Graph of Thoughts (GoT)

Evolution of ToT where thoughts form graphs, not just trees. Allows merging and splitting reasoning lines. Useful for problems with multiple interdependent solutions.

Self-Improvement Techniques

4. Self-Refine

The model generates → critiques → refines iteratively. Example: "Generate a solution, identify its weaknesses, then improve it"

5. Reflexion

Learns from previous failed attempts. Maintains a "memory" of errors and uses them to improve. Particularly useful for code and math tasks.

6. Constitutional Self-Correction

Applies constitutional principles after generating. "Review your answer: is it accurate? is it helpful? does it avoid harm?"

Emerging Techniques 2024-2025

7. Chain-of-Verification (CoVe)

Generates answer → creates verification questions → verifies → corrects. Significantly reduces hallucinations.

8. Thread-of-Thought (ThoT)

Maintains multiple parallel reasoning threads. Threads communicate with each other during the process.

9. Skeleton-of-Thought (SoT)

First generates response skeleton. Then expands each section in parallel. Reduces latency for long responses.

Conclusion

Prompt engineering is a fundamental skill in 2025 and the coming years. You don't need to be an AI expert to start - you just need to understand these techniques and practice.

My advice: start with zero-shot, then add examples (few-shot), and when you need more power, use Chain of Thought (CoT). Advanced techniques are great, but in 80% of cases, a good few-shot with CoT is all you need, obviously using a reasoning model.

And remember: writing good prompts is more art than science. The only way to improve is by practicing.

Prompt engineering is a living science and now is more important than ever.

A good prompt can work magic or break the LLM especially in products.

Prompt Engineering: Techniques I Actually Use