How to get what you want from an LLM

rendering of an old fashioned vending machine with a typewriter interface

I’m creating a software agent that writes code to handle user requests (for sales and marketing use cases). I’ve easily written over 100 prompts this past year. Here’s how I’m approaching it these days.

I’ve been writing many prompts to be executed via the GPT and Claude APIs. I’ve kept them in a platform independent format. GPT and Claude have a different conversational format. Claude is faster and cheaper (and seems to be great at the tasks I need it to do). So, I’ve resisted GPT function calling to avoid lock-in. For the time being I want to be able to switch between the two based on cost, performance.

I’ve come to appreciate “what’s unsaid” 

Generally working with a Chatbot is an iterative conversational interface. Through clarification, you can narrow in on just the type of response you need from the LLM by providing additional information and clarification as necessary.

Example 

Human: “Who do we know that is an ML-ops expert?” 

Edgar: “Do you mean what contacts we have in the contact database, someone you’ve emailed… or?”

Trying to get there in one go (a single API call with prompt and completion) is harder. Imagine assigning a junior employee a task and not giving them a chance to ask any clarifying questions. This is what I mean by “what’s unsaid.” Writing a bunch of prompts has made me more aware of the assumptions and expectations I have about the output the LLM should generate. Good prompts state these expectations and guidelines in the prompt. Often they evolve over time.

Another way to think of it is like sculpting. You start from a vast space of an infinite space of possible completions. You must use your prompt to narrow that completion until the only thing left to say is the right answer. This is the “think step by step” prompt to get the LLM to logically narrow in on the desired answer.

This will go a long way to prevent hallucination (assuming that’s what you want). Additionally, you can prescribe answers to choose from (ask for a multiple choice answer). Or, give the LLM a way out: “answer UNKNOWN if you don’t know.”

Evals = Tests (for the edge cases)

I was late to the game with writing evals. When you have fuzzy inputs it’s important to maintain base cases and edge cases that you expect to work. Otherwise, as you tweak the prompt – while it seems obvious that the change will be better in all cases – you may be surprised that “easier” completions stop working.

Iterative prompts are powerful

My favorite prompts are ones that refine themselves in conjunction with a code script. For example,

We’re searching for information on “Toyota motor company”
We’ve already searched for:
- Toyota market cap
- Toyota headquarters
We found:
- Toyota’s market cap is $310B 
- Toyota’s corporate headquarters is in Toyota, Aichi, Japan
What should we search for next? If there is nothing left to search for, reply with DONE.

The LLM returns a completion containing a suggested search. The code does the search, and adds the information to the prompt in the next go-round. In this way, the LLM becomes a simple iterative research agent. This can be used in all kinds of scenarios, like “we need to get the user’s preferences, here’s what we know so far, what should we ask next?”

Hat tip to Brian Fioca, this is the simplest version of what he calls a “scenario prompt.”

Keep it conversational

When building these sophisticated prompts, it’s easy to get to something that – looked at holistically – looks really messy – like an oddly pasted together of “here’s some examples: follow these steps! Always return JSON!” I’ve found that – once you’ve got the key elements in place – it usually helps to edit your prompt to be more naturally flowing. As if you took the time to write a clear question via a chat interface. In other words, keep it conversational! The models are fine-tuned on replying to humans in conversation to return successful outcomes, and will respond better if your dynamically generated prompt reads like a human wrote it.