6 months ago, I set out to create a software agent using LLMs. Currently, I’ve got a library of 700+ mostly-machine-generated skill functions, and a rough process for handling unseen customer requests. As the Edgar project stands today, it is more like a 100x engineer-enabler than AGI, but the dream is very much alive.
An agent (by the current industry understanding) should handle open-ended requests from users with real time clarification, planning and working memory.
Today, AgentGPT is an open source embodiment of that. However, I think any prompt chain (or assemblage of prompt-chains) with working memory can be considered the beginnings of an agent. These are sometimes also referred to as “multi-agent” architectures.
I’ve been lucky to work on Edgar full time for the last 6 months. Edgar started as an experiment to implement the Voyager strategy – dynamic code generation applied to business workflows rather than Minecraft. Today, Edgar’s 700+ skills primarily implement API connections and sales and marketing prompt chains.
Here is an example of a simple Edgar skill:
And one of Edgar’s more complex skills:
(diagram shows a call graph of an assemblage of skills with top-level skill at the left)
Naively, I also hooked up Edgar to slack for my co-workers to ask it to complete business tasks (as a GPT and Claude-enabled chatbot). The original version of Edgar would write a new skill for every request, assembling existing skills from its library. Edgar was great at requests like:
“make a powerpoint slide, where each slide shows the (use contacts.photo_url), and name (use contacts.display_name) for 100 people from the contacts database.”
I could “program” Edgar via chat message, but of course, my co-workers asked things like “who do we know in the semiconductor industry?” That’s too vague for Edgar to handle.
Over time, we’ve broken out Edgar’s workflows into narrow prompt chains, and turned off the chat. As we’ve done so, Edgar feels less like an agent. The plan is to bring that all back together into a cohesive working system. The recent “GPTs” launch could be a great way to support separate Edgar-instances for each of our trial customers. The next stage will involve pre-training Edgar with relevant skills, and letting each customer invoke them with their own context.
Heading into the AI.Engineer conference, I was unsure what an agent even was any more. But my discussions with others quickly crystalized that.
Starting out, it was obvious to me that an LLM (witness Baby AGI and AgentGPT) was not going to be able to plan and execute user requests in real time. It would be too expensive and error prone. I think code generation (and interpretation) will be core to any AGI strategy. Even a brilliant reasoning engine needs an endless variety of code to invoke API endpoints to get things done. Edgar’s skills will ultimately become the training set. Edgars skills represent the codification of customer preferences, best practices and evolved prompt-chains. The GitHub issues, temporary files, and the databases Edgar connects to are its working memory.
The Cognitive Architectures for Language Agents (aka CoALA Paper) suggested a generalized architecture based on surveying several agent projects. This does not incorporate code generation.
I think the best (and most tangible) “test” of agent hood I’ve seen are the AutoGPT evals. With skill preparation Edgar will easily be able to conquer these (and I’m eager to do that). But that’s not the test. The test is can Edgar then handle a similar new (previously unseen) request – and build or execute the new skill without user intervention.
Lastly, I thought the Latent Space interview with Kanjun Qiu of Imbue (an agent research and building company) was great. She spoke to the challenges of building autonomous agents. It’s easy to anthropomorphize any prompt chain into an “agent,” but the key will be identifying what types of reasoning LLMs are good at, and how to utilize them as part of a software agent cognitive architecture. It sounds like Imbue is tackling this problem by implementing internal agents and pushing them to their limits.