Elevate Your AI Interactions: A Guide to Crafting Powerful Prompts

Bhavya Barri
10 min readSep 25, 2024

--

Introduction

I recently realised there’s been a paradigm shift in how I search these days, have to settle a debate on who’s right — Perplexity. Have to understand a technical concept — Claude. LLMs are essentially changing how we approach intellectual labor.

Hear me out,

Like most Indians, I too had these two questions in the last few days: “What are the best seats for the Coldplay India concert?” And if you’re thinking getting tickets was easy, think again. My next question was: “What are the chances of getting a ticket?” 🫣

The ideal place to check this out is Reddit. I opened Perplexity, set the Focus mode to Social and hit Go. This is the answer I got along with links to Reddit.

PS: Having read a lot of sub-reddits, I know this IS EXACTlY what everyone has been saying

Why all this? Right now, AI has both the capability and ubiquity to significantly improve your productivity (by more than 20%). We have crossed the product Rubicon. AI technology is no longer those demos that don’t hold up; it’s been incorporated into genuinely useful products like Claude and Perplexity made possible by a combination of model improvement and product development.

So when someone these days says “I don’t find GPT that useful,” I just assume that it’s because they don’t know how to prompt well. After trying out a bunch of techniques and reading tons of good blogs on prompting, I’ve put together a small guide.

Mental model: Prompts as conditioning

Let’s simplify how we think about large language models. LLMs are super-smart probabilistic models. They take in an input and give out responses based on patterns they’ve learned from a huge amount of data — kind of like an average of what’s on the internet.

But here’s the thing: because it’s an average, the usual responses you get might seem a bit boring or generic. GPT can do all sorts of amazing things — you just need to learn how to ask it the right way. At its core, prompt engineering is steering (conditioning) the model to get that desired output.

Example: Consider the prompts below. The first will likely generate a response about Apple the tech company. The second will describe the fruit. And the third will explain the idiom.

# Prompt 1
Tell me about Apple
# Prompt 2
Tell me about the company Apple
# Prompt 3
Tell me about Apple of my eye

We have conditioned the model to respond differently by simply adding a few tokens.

10 Principles of Prompting with examples

1. Give Direction

Describe the desired style in detail or reference a relevant persona to guide the AI’s response.

We get better results than a generic prompt by just referencing a relevant persona. The second article at least feels like it’s written by a human and not entirely generated (as in the first case).

# Prompt 1
Write a blog post on "Prompt Engineering for Product Managers"

Result: https://claude.site/artifacts/1013bfb0-2e0c-4dfc-9b77-2f9fc40e6160

# Prompt 2
Write a blog post on "Prompt Engineering for Product Managers" in style
of Paul Graham's writing

Response: https://claude.site/artifacts/4b6b9025-8da7-451d-9490-fbd16bf8bebf

2. Specify Format

Define input and output rules, establish the structure of the response, and set clear constraints.

Structured input helps the LLM better understand the task and input, improving the quality of output. Structured output makes it easier to parse responses, simplifying integration with downstream systems.

In this example, we ask Claude to extract attributes from a product <description>.

<description>

The Marshall Emberton is a compact portable Bluetooth speaker available in
classic Black and Brass or Cream for $169.99. At just 2.68 inches tall and
6.30 inches wide, it delivers the legendary Marshall sound anywhere you go.
With 20+ hours of playtime and an IPX7 water-resistance rating, this rugged
little powerhouse brings iconic audio to your outdoor adventures or home
listening sessions. The Emberton's True Stereophonic multi-directional sound
offers a 360° experience that will impress any audiophile.

</description>

Extract the <name>, <size>, <playtime> and <price> from this product
<description>
<name>Marshall Emberton</name>
<size>2.68 inches tall, 6.30 inches wide</size>
<playtime>20+ hours</playtime>
<price>$169.99</price>

3. Provide Examples

Supply a diverse set of test cases where the task was done correctly. Use n-shot prompting for better results.

You might have heard of “zero shot,” “few shot,” or “many shot.” “Shot” refers to the number of examples the model was given of the task done correctly. One reason this works is that it is often easier to provide an example than explain why you like it.

Adding one example to the prompt (labeled 100 in the chart) took accuracy from 10 percent to 45 percent.

Callout: Adding examples helps, but can be a tradeoff between creativity and reliability (how often it follows your instructions correctly).

Important tips for maximum effectiveness:

  1. If we provide too few examples, we risk the model overfitting the responses (having atleast a dozen works). Most academic evals use 32-shot or 64-shot prompts.
  2. More importantly, the examples we input should be representative of the expected production inputs. Your examples should mirror your actual use case.
  3. Also, take care to match the distribution of examples to production data. If 80% of production aspects are positive, the n-shot prompt should reflect that too.
  4. The number of examples to add also depends on task complexity. For example, if we are planning to get a structured JSON output, adding 5 examples of output pairs should suffice.
  5. Your examples should be wrapped in <example> tags for structure. If you have multiple examples, nest them within <examples> tags. (Claude specific)

4. Evaluate Quality

Identify errors and rate responses. Test what drives performance to improve output.

If you are planning to integrate this in production at scale, this is important. This can be as simple as copy-pasting responses in Google Sheets and counting how often it went wrong.

  • Evaluating prompt quality is crucial when using prompts at scale. It involves identifying errors and tracking performance across multiple edge cases.
  • A/B testing different prompt strategies and variations can help optimize performance, but this process can be time-consuming and challenging to measure effectively.
  • The most reliable evaluation method is human ratings. These are often combined with automated metrics and AI-assisted evaluation, to ensure consistent improvement in prompt results.

5. Divide Labor

Split complex tasks into multiple steps, chaining them together for better results. Use Chain of Thought (CoT) prompting.

Giving the LLM “time to think” before generating a response helps break the problem and condition its own response, often leading to better results.

You can do this in different ways:

  1. Basic prompt: By adding the phrase — “think step by step” to the prompt, it lacks guidance on how to think but works well for basic use cases. This is especially not ideal if a task is very specific to your app, use case, or organization.
Basic Chain of Thought example

2. Structured prompt: One idea is to contain the CoT within a designated <sketchpad>, and then generate the <summary> based on the sketchpad. You can also add XML tags like <thinking> and <answer> to separate reasoning from the final answer.

Claude, you are tasked with analyzing and providing key insights from the 
customer feedback <survey_data>.

<survey_data>
{survey_data}
</survey_data>

Think step by step on how to analyze the <survey_data> within the provided
<workbench>.

Then, return an <insights_report> based on the <workbench>.

3. Guided prompt: Outline specific steps for the model to follow along.

Claude, you are responsible for analyzing a scientific research paper <paper>.<paper>
{paper_content}
</paper>
Think step by step on how to analyze the <paper> within the provided <analysis_space>.In the <analysis_space>, return a list of <key_findings>, <methodology>, and <limitations>.Then, check that <analysis_space> items are factually consistent with the <paper>.Finally, return a <research_summary> based on the <analysis_space>.

6. Assign Role and Responsibility

Give the AI a specific role or persona to adopt, which can help frame its responses appropriately.

Example 1 -

Prompt — You are a preschool teacher. Explain how attention in LLMs works.

Response —

Gather 'round, little ones! Today we're going to learn about something very special called "attention" that helps smart computers understand words better. Imagine you have a big box of colorful building blocks. Each block has a different word written on it. Now, when the computer wants to understand a sentence, it's like it's building a tower with these blocks.But here's the magic part! The computer has special glasses that help it see which blocks fit together best. When it puts on these magic glasses and looks at one block, some of the other blocks start to glow! The ones that glow the brightest are the ones that go best with the block it's looking at.So, as the computer builds its sentence tower, it keeps putting on its magic glasses to see which blocks glow the brightest with each new block it adds. This helps it build a tower that makes sense!Sometimes, it might look back at blocks it used earlier to make sure everything still fits together nicely. It's just like when we're reading a story and we remember what happened at the beginning to understand what's happening now.And that's how the computer pays attention to make sure it understands everything correctly. Isn't that neat?Now, who wants to build their own sentence towers with our classroom blocks?

Prompt -You are an NLP professor. Explain how attention in LLMs works.

Response -

Attention in Large Language Models (LLMs) is a mechanism that allows the model 
to focus on relevant parts of the input when processing or generating text. Key points:
1. Self-Attention: Each token in a sequence attends to all others, using Query (Q), Key (K), and Value (V) vectors.2. Computation: Attention scores are calculated as scaled dot products between Q and K, then used to weight V vectors.3. Multi-Head Attention: Multiple attention mechanisms run in parallel,
capturing different aspects of the input.
4. Positional Encoding: Added to preserve sequence order information.5. Masking: Used in decoders to prevent attending to future tokens.6. Transformer Architecture: Attention is a core component, combined with feed-
forward layers and normalization.
7. Scalability: Attention's quadratic complexity with sequence length can be a
challenge for very long inputs.
This mechanism enables LLMs to capture long-range dependencies and create context-aware representations, crucial for their performance across various NLP tasks.

Example 2:

Prompt 1: Review this code snippet for security vulnerabilities.

Prompt 2: You are a senior cybersecurity engineer specializing in application security. Review this code snippet for security vulnerabilities.

Prompt 3: Your task is to protect users from potential cyber threats by identifying any and all security vulnerabilities in code. Review this code snippet for security vulnerabilities.

This example shows how roles and responsibilities can improve an LLM’s code security analysis:

  1. Basic Prompt: May yield general results.
  2. Role-Based Prompt: Invokes specialized knowledge of a cybersecurity expert, potentially uncovering more sophisticated vulnerabilities.
  3. Responsibility-Based Prompt: It encourages vigilance by framing the task as user protection, likely increasing recall of potential issues.

7. Prefill Responses

Provide partial responses for the AI to complete, guiding it towards the desired output format. This works especially well for Claude.

To prefill, include the desired initial text in the Assistant message. Claude’s response will continue from where the Assistant message leaves off:

Example 1: Control output formatting and skipping the preamble. Prefilling { forces Claude to skip the preamble and directly output the JSON object.

Another use case is prefilling a bracketed [ROLE_NAME]. This can remind Claude to stay in character, even for longer and more complex conversations. This is especially powerful when combined with role prompting in the system parameter.

8. Long context prompting

Strategically place important information within the prompt to maximize its impact on the AI’s response.

  1. According to Anthropic’s official blog, putting long-form documents at the top of the prompt improves Claude’s performance.
  2. Pattern to use for Claude:
  • Role or responsibility (usually brief)
  • Context/document
  • Specific instructions
  • Prefilled response

3. Structure document content and metadata with XML tags. When using multiple documents, wrap each document in <document> tags with <document_content> and <source> (and other metadata) subtags for clarity.

4. Ground responses in quotes: For long document tasks, ask Claude to quote relevant parts of the documents first before carrying out its task. This helps Claude cut through the “noise” of the rest of the document’s contents.

9. Address Hallucinations

Implement strategies to minimize false or irrelevant information in the AI’s outputs.

Although this isn’t foolproof, it works for tasks involving extraction or question answering. When you’re setting up a task for an AI to extract information or answer questions, it’s a good idea to give it permission to admit when it’s not sure. You might want to tell it something like, “It’s okay to say ‘I don’t know’ or ‘Not applicable’ or ‘This doesn’t apply’ if you’re not certain.” Also, consider telling the AI to only give an answer when it feels really confident about it.

Example

Claude, answer the following question based on the provided <context>.<context>
{context}
</context>
If the question CANNOT be answered based on the <context>, respond with "I don't know".Only provide an answer if you are highly confident it is factually correct.Question: {question}Answer:

10. Iterate and Refine

Continuously improve your prompts based on the AI’s responses, adjusting your approach as needed. Simple things like pointing out faults in the output and asking Claude to improve the prompt accordingly work wonderfully at times too.

Hope this is useful!

--

--