A Bad Prompt Wrangler Blames their Agent
Effective use of a Large Language Model requires effective prompting. Anthropic’s new Skills feature lets enterprises and users provide structured context that Claude loads dynamically, making prompting more reliable, consistent, and performant.
The familiar adage ‘A bad workman blames their tools’ can be extended to A bad manager blames their team’, ‘A bad stock picker blames their luck’ and so on. There are of course nuances; the worst workmen may blame even the best tools, and the best workman will produce a good outcome despite terrible tools.
In 2025 the most lauded and most decried tool is variously known as a Large Language Model (LLM), generative AI, an AI agent, or simply ChatGPT (in the same way that a vacuum cleaner is still often called a Hoover, much to the chagrin of James Dyson).
The effectiveness of this tool is partly intrinsic; GPT-5 is to GPT-3 what an M16 assault rifle is to a flintlock musket. But a bad workman can still make a complete hash of using a good tool and vice versa; would you bet on me with an M16 assault rifle against a navy seal with a flintlock musket?
With that background, let’s talk about prompt wrangling – the often murky and iterative art of refining your questions or instructions to get your agent to actually do your bidding. Some aspects of good prompting are fairly intuitive. It’s no more reasonable to expect an AI agent recommending a restaurant to divine your culinary tastes than it is to expect a restauranteur to realize you are vegan simply by the cut of your jib or the weave of your sweater (thankfully gone are the days when vegans were stick thin with often questionable taste in clothing). But it is surprising how much better an agent can be if you provide it with precise enough, well enough crafted guidance. This also includes reminding it of stuff you only told it yesterday or stuff you might think is bleeding obvious.
Fortunately some of this repetitious prompting is done for us in the system prompt – many pages of context pre-pended by the system to every question or instruction before passing the combine prompt to the LLM. Just to highlight how basic some of this is, the current Claude system prompt starts with “You are Claude…”. Accepting that your agent is prone to forgetting it’s own name is a useful mindset when thinking about how to get the best results.
Anyone using an AI coding agent such as Cursor, claude Code or ChatGPT Codex will know the importance of good code documentation, a detailed style guide and a precise description of the workflow including sort of obvious things like “test your code rigorously before telling me it’s working”. This is often documented in the claude.md file for a particular project.
Whenever Claude forgets to do something (it is notoriously forgetful about comprehensive testing) I instruct it to write clear instructions to itself in my project claude.md file. Frustratingly yesterday it still forgot to test, despite having recently put clear ‘always test’ instructions to itself into claude.md. It’s proposed solution to this bad behaviour was to mark the instructions as critical. The lesson here is that it is actually helpful to use phrases like ‘very important’, ‘never, never’ or ‘Critical:…’.
I have seen claims that saying things like “People may die if you do not test properly” can also help, but there are other studies showing mixed results when using extreme prompting language or threats.
When a human worker starts a new job they are naturally diligent and thoughtful but it’s not uncommon for the quality of their work to degrade over time. But many of us feel awkward or patronising when repeatedly reminding them of their duties. We did once meet a guy with a total humour bypass who had absolutely no qualms providing instructions like “If you don’t dust under the sofa, people will die”, but unsurprisingly he was universally disliked by trades people, neighbours and even the estate agent selling his house. In contrast, AI agents have zero ego and are not programmed to ‘feel’ patronised or insulted. Whether or not you say ‘please’ to an AI agent has zero impact on the calibre of their work and repeatedly reminding them of what you want done, without gratuitous threats, does pay dividends.
Nonetheless, the challenge, once you’ve got the hang of everything you need to say, and say again, is that a simple instruction can expand into a complete manual. This can be counterproductive because the important instructions lose emphasis in a very long prompt and analysing the lengthy input burns more of your monthly quota. Fortunately the foundation model providers (Open AI, Anthropic, Google, etc) are competing to productionize and streamline effective use of their amazingly powerful models. Last week Anthropic announced Claude Skills which allow enterprises or individual users to provide folders full of task instructions, branding guidelines, and even snippets of code to get repetitive work done more efficiently and predictably.
<SlightlyTechnical>
This Anthropic engineering blog explains the rationale for skills and their capabilities.
“Progressive disclosure is the core design principle that makes Agent Skills flexible and scalable. Like a well-organized manual that starts with a table of contents, then specific chapters, and finally a detailed appendix, skills let Claude load information only as needed:
Agents with a filesystem and code execution tools don’t need to read the entirety of a skill into their context window when working on a particular task. This means that the amount of context that can be bundled into a skill is effectively unbounded.”
</SlightlyTechnical>
Anthropic’s higher level introduction to skills says ‘Building a skill for an agent is like putting together an onboarding guide for a new hire’. But the difference is that every time Claude encounters a task it identifies the appropriate skill and re-reads it, so hopefully less chance of people dying due to forgotten dust under the sofa.