A black-and-white ink illustration of a person holding an open book, with small shapes and marks bursting upward from the pages.

EssayMay 8, 202615 min read

How to get better work from AI agents

A practical guide to working with AI agents, based on real conversations and the patterns that turned prompts into workflows.

Listen to article

0:00-22:23

I asked the AI agents I use to read through my local chat history. I wanted to understand a shift I could feel in my work, but had not written down yet.

I knew I was using AI a lot. I was using it for coding, debugging, design implementation, documentation, PRs, internal research, and small workflow problems that are hard to explain until you see them in motion. But the interesting part was not the volume. The interesting part was that my usage had changed.

At first, I was mostly asking AI to do things. Build this, fix that, copy this, make it match. Over time, the conversations became less about isolated tasks and more about a way of working. I was asking the agent to inspect reality, preserve intent, debug systems, review its own output, use the right memory, and eventually help me create better instructions for future agents.

So this is a tips guide, but not in the abstract. It is based on a few hundred real agent conversations from my local history, with the examples shaped for public reading and focused on the behavior rather than the private details behind the work.

The local analysis covered about 300+ parent chat threads, about 100+ hours of active collaboration time. That is enough to make the patterns bigger than one or two memorable chats.

1. Define the boundary before the task

One of the first patterns that shows up is boundary-setting. The useful prompt is not only "make this thing," it is "make this thing without changing that other thing."

In one early thread, I asked the agent to duplicate an existing product surface, create a separate version, and make sure future changes to the new version did not affect the original. The note to the agent was simple:

make sure changes to this new variant do not affect the other concepts.

That is already a better instruction than "copy this." It gives the agent a blast radius.

The pattern:

Name what can change.
Name what must not change.
Name the route, file, component, variant, or workflow that owns the change.
Ask for verification that the boundary held.

This became one of the clearest differences between casual AI usage and useful AI usage. The agent is much better when it knows the edge of the work.

2. Treat fidelity as a contract

AI is very good at plausible output. That is also the problem.

In design work, plausible is often wrong. The spacing can be plausible and still not match the source. The icon treatment can be plausible and still be invented. The copy can be fluent and still fail the product intent.

That is why I started making fidelity explicit. In one thread I wrote:

these two should be 1:1 copies as of now.

In another, the direction was even clearer:

use the assets from the source design and do not take liberties.

The pattern:

If the job is exploration, let the model propose.
If the job is implementation, make the source of truth explicit.
If a design, screenshot, or existing route is the reference, treat divergence as a defect.
Do not let "looks good" replace "matches the source."

The best phrasing is direct: match the source of truth unless I explicitly say we are exploring.

3. Move from adjectives to evidence

"This feels off" is a valid human reaction, but it is a weak debugging input.

The better version is evidence. DOM paths. Component names. Screenshots. Recordings. Browser-edited CSS. Failing commands. Logs. Check names. Exact moments in a flow.

One debugging prompt said:

analyze the recording frame by frame to detect why we have slight jumps and glitches, then fix it in a scalable way.

That prompt is doing several things at once:

It gives the agent the artifact.
It names the symptom.
It asks for analysis before the fix.
It blocks a hardcoded patch.
It asks the agent to connect the visual issue back to the system.

This is where AI starts feeling less like a chatbot and more like a working partner. The quality of the answer changes when the input is observable.

4. Debug with traces, not vibes

The debugging conversations are some of the most useful parts of the archive because they show a technical pattern that transfers well.

The basic move is simple: bring the trace, then ask the agent to identify the mechanism.

In one interaction bug, the issue was not just that a sticky element felt wrong. The real question was when it should become sticky, what it should attach to, and how that should relate to scroll position. The correction captured the problem clearly:

the behavior is backwards; it should stick once I scroll past it, not the other way around.

That is a state-machine correction, not just visual feedback.

The debugging pattern:

Failure type	Better evidence	Better question
Motion feels janky	Recording, frame timestamps, affected element	Which frame or state transition creates the jump?
Layout looks wrong	Screenshot, DOM path, component name	Which layout contract is being violated?
Rendered data is wrong	Payload, schema, rendered UI	Where does the data contract diverge from the renderer?
CI fails	Check name, logs, changed files	Is this code, formatting, workflow config, or permissions?
Access feels broken	Permission state, route, API response	Which state is stale, missing, or out of sync?

The important shift is that the agent is not only patching. It is helping classify the bug.

5. Ask for the mechanism before the fix

When a bug repeats, another patch is usually not enough.

The useful question becomes: what is the underlying mechanism? Is it layout? State ordering? Scroll anchoring? Data shape? Auth state? Rendering sequence? A missing source of truth?

When I ask for the mechanism, the agent has to explain the cause instead of only producing code. That makes it easier to spot shallow fixes.

The reusable prompt looks like this:

Before changing code, explain what is causing this behavior. Then patch the root cause in a way that works beyond this one screenshot. Then verify against the original evidence.

This prompt shape works especially well when the agent has already failed once. It slows the system down enough to stop thrashing.

6. Compare contracts when systems disagree

A lot of technical AI work is contract comparison.

Sometimes the problem is not "this is broken" in a generic way. The problem is that two things that should agree no longer agree. A design and an implementation. A tool payload and a rendered widget. A generated message and the underlying data. A permission API and the app state. A repository rule and the actual PR behavior.

One prompt said:

check that the way we render responses matches the responses returned from the tool, and that we render them in the same order.

That is a strong prompt because it defines the comparison.

The reusable version:

Compare A against B. Identify every mismatch. For each mismatch, tell me whether the contract, the data, the renderer, or the documentation should change.

This turns the agent into a diff engine with judgment. It works for UI, APIs, workflows, docs, and generated content.

7. Keep taste in the loop

AI can generate quickly, but taste still needs a human owner.

The chats are full of course correction. The text does not build enough value. The animation feels too fast. The sequence is not readable. The design is not faithful. The assistant text is repeating what the widget already says. The flow is not connecting narrative and data.

In a content pass, I wrote:

make sure the changes are consistent with the narrative and data.

In another moment, the direction was:

the widget can give the overview, but the assistant should distill information too.

That is the right role for AI in product and design work. The agent can produce, but the human still directs intent.

The pattern:

Ask whether the copy adds value beyond the UI.
Ask whether the animation helps the user understand what changed.
Ask whether the design matches the source rather than just looking acceptable.
Ask whether the narrative and data agree.

The model can fill space. The work is making sure the space is worth filling.

8. Use specialized rubrics when the work depends on judgment

"Make it better" is too vague. A rubric turns taste into something the agent can act on.

For motion, I asked the agent to use an animation-specific rubric and then gave feedback like:

the typing feels too fast; make it slower so we can notice the sequencing of the elements appearing.

That is more useful than saying "smooth it out." It names the experience I want the user to have.

Useful rubrics:

Motion: timing, easing, sequencing, reduced motion, jank.
Writing: clarity, flow, tone, repetition, reader value.
Accessibility: keyboard flow, contrast, semantics, reduced motion.
Security: secrets, permissions, auth state, unsafe automation.
Agent docs: trigger, scope, examples, validation, ambiguity.

Rubrics do not replace judgment. They make judgment legible to the agent.

9. Use plans to separate thinking from execution

Plans are useful when the work has shape.

For tiny edits, planning can be ceremony. But when a task touches multiple files, has product consequences, or needs to preserve a boundary, a plan becomes a shared object between me and the agent. We can agree on scope, steps, and verification before anything moves.

One instruction captured the execution mode:

Implement the plan as specified. Do not edit the plan file. Mark todos as in progress as you work.

The pattern:

Inspect first.
Plan the change.
Execute against the plan.
Track progress.
Verify against the original goal.

The value is not the plan file. The value is the separation of modes.

10. Turn repeated corrections into rules

At some point, I stopped only correcting the current agent and started asking how to make future agents avoid the same mistake.

One prompt said:

how can we make it clear that before any agent starts a new feature, it checks the scalability changes we made?

That is a different move. The goal is not only to fix the current output. The goal is to encode the lesson.

The same pattern appears in the guidelines work. I asked for a markdown document with examples and clear instructions for how agents should generate a certain kind of product text. Then I clarified that the document was not technical documentation:

a guideline for other agents.

The pattern:

A repeated preference becomes a rule.
A repeated review checklist becomes a skill.
A repeated setup flow becomes a command or script.
A repeated explanation becomes documentation.
A repeated handoff problem becomes a structured handoff.

A good prompt solves a moment. A good artifact improves the next moment too.

11. Use the right memory before reasoning

The agent's answer is only as good as the memory it uses.

For general design thinking, model knowledge can be enough. For code, the agent should inspect the local codebase. For organization-specific process, architecture, or conventions, it should use internal knowledge. For my own working style, past chats become the memory.

One prompt said:

look into internal knowledge for how we render widgets currently and whether this schema idea makes sense.

The pattern:

Code question: read the code.
Design question: inspect the design or screenshot.
Org question: search internal knowledge.
Workflow question: inspect the existing workflow.
Personal style question: inspect previous examples.

The model can sound confident from the wrong memory. The fix is to route the question before answering it.

12. Build better handoffs instead of writing longer prompts

Some of the best AI work was not code or copy. It was improving the handoff into the agent.

There was a thread where the goal was to make a visual annotation workflow better for Cursor. The agent needed to receive an annotated screenshot and structured text that connected pins to comments. The expectation was:

the screenshot with annotation pins plus structured text that ties each annotation to the comment.

That example matters because it changes the problem. Instead of repeatedly explaining visual feedback to the agent, improve the system that sends feedback to the agent.

The pattern:

If context is hard to explain, structure it.
If screenshots need interpretation, annotate them.
If future agents need the same setup, create a handoff format.
If the agent keeps missing context, improve the context pipeline.

Sometimes the better prompt is not a prompt. It is a better input surface.

13. Protect the work from the context window

Long AI sessions have a hidden failure mode: the work can outgrow the context window.

When that happens, the risk is not only that the agent forgets a detail. The risk is that it forgets the shape of the work: what was decided, what was tried, what failed, what still needs verification, and what should not be touched. A fresh context can be useful, but only if it starts with the right handoff.

The reusable prompt looks like this:

Before the context gets too full, write a handoff document for the next agent. Include the goal, current state, decisions made, files changed, what was tried, what still needs to be done, verification status, and any constraints the next agent must preserve.

The pattern:

Ask for the handoff before the context collapses.
Include decisions, not just tasks.
Include failed attempts, not just successful changes.
Include verification status and remaining risk.
Include explicit "do not touch" constraints.

This is one of those AI habits that feels small until it saves a project. A good handoff lets the next context start with continuity instead of archaeology.

14. Let the agent operate inside the workflow

The later chats show more operational delegation. I ask the agent to inspect PRs, check failing validations, look into potential secrets, push changes, understand blocked merges, and fix failures so future PRs do not hit the same issue.

One prompt said:

check the PR again and see what is failing, then fix that for future PRs.

This is different from asking AI to write a function. It is asking the agent to stay inside the workflow.

The pattern:

Find the issue.
Fix the issue.
Verify the fix.
Explain what changed.
Prevent the same failure from recurring.

This still needs supervision. The point is not to let the agent merge whatever it wants. The point is to give it a real operational outcome and require it to inspect status, read checks, respect rules, and report back clearly.

15. Review the agent from multiple angles

Generic review produces generic feedback. Specific review lenses work better.

In the guidelines work, I asked for compactness, ordering, agent suitability, portability, and later a polished export. The instruction was:

use this doc-authoring skill and make sure we are not adding fluff.

The pattern:

Review for correctness.
Review for reader flow.
Review for security.
Review for fidelity.
Review for whether future agents can follow it.
Review for whether it is too generic.

The agent does better when review is not a single vague pass.

16. Create catalogs, not just files

Reusable work becomes more valuable when people can find it.

In one workflow, I did not stop at creating skills. I asked for a catalog, search, install flow, update flow, PR protection, ownership, and validation. The clarification was:

the user should be able to install specific skills, not all the skills in the repo.

That is the difference between an artifact and a system.

The pattern:

A file is useful once.
A catalog makes it discoverable.
A validation workflow makes it trustworthy.
Ownership rules make it maintainable.
Install and update paths make it portable.

If AI helps create something valuable, ask how someone will find it, trust it, update it, and reuse it.

17. Keep privacy and security in the workflow

AI makes it very easy to turn private work into public prose. That also makes it easy to leak private context.

The chats include several moments where credentials, access, permissions, and possible secrets became part of the work. One note was direct:

I saw something about a potential secret; check that.

The pattern:

Do not paste secrets unless absolutely necessary.
Ask the agent to identify sensitive values before publishing.
Replace exact internal names with roles or categories.
Remove private URLs, repo names, PR numbers, and customer-like details.
Keep the lesson, remove the identifying surface area.

Concrete examples are useful. Concrete leaks are not.

18. Ask AI to analyze how you use AI

The final pattern is the most meta, and also one of the most useful.

This article came from asking my personal code AI assistant to inspect my own chats, extract patterns, include actual examples, show the evolution, add a source-of-belief layer, avoid private disclosure, and then turn the article into a shareable skill. The request was:

look across all of our chats, distill learnings, include actual examples, and show the evolution of how I use AI.

That is worth doing because your best AI workflow is probably already visible in your usage.

Look for:

What instructions do you repeat?
Where do agents fail you?
What evidence improves the output?
Which tasks are becoming workflows?
Which preferences should become rules?
Which artifacts should become skills?

The prompt that captures it is: study how I work and show me the operating system I am accidentally building.

Some final thoughts

The practical lesson is not "write better prompts." That is too small.

The bigger shift is designing the collaboration around the agent. Give it a boundary. Give it evidence. Give it the right memory. Ask for the mechanism. Review with a lens. Turn repeated corrections into reusable artifacts. Keep privacy in the loop.

And when the work gets long, protect the continuity of the work itself. A fresh context is only useful if the next agent knows what it is inheriting.

That is how AI becomes more than a faster text box. It becomes a layer across the work: implementer, debugger, reviewer, researcher, archivist, and eventually a way to encode how future agents should behave.

Hopefully this is useful as a guide, but still grounded in the messiness of actual product work. There is still a lot to learn here, especially around how designers can turn taste, critique, and product judgment into better instructions for agents without giving up the parts of the work that should stay human.

If anyone is brave enough to try this, here is the standalone skill you can install.

If you want to install it yourself:

curl -fsSL https://www.pologarcia.is/skills/install.sh | bash

If you want an agent to install it for you, paste this prompt:

Install this reusable agent skill from https://www.pologarcia.is/skills/pologarcia.md. Add it wherever this tool stores persistent skills, memories, or reusable instructions, and name it pologarcia. Keep the file intact, do not rewrite the guidance, and tell me where you installed it. If this tool does not support skills, save it as a reusable local instruction file and explain how I can enable it.