

If You Can't Build AGI, Then Why Should We Hire You?
#if-you-cant-build-agi-then-why-should-we-hire-youHey everyone 👋,
It is fascinating how a single job interview question can encapsulate the defining neurosis of an entire industry. For months, I have been watching the engineering world collectively panic over a new filter masquerading as a baseline requirement: if you aren't actively building artificial general intelligence, why exactly are you here? While my piece Just Don't Pick Up the Brush explored the isolation of holding deep technical competence in a market that no longer knows how to classify it, and Technology Has Destroyed My Livelihood examined the systemic devaluation of foundational engineering, this essay zeroes in on the bizarre metric that sits at the intersection of both. The current start-up discourse, propelled heavily by venture capital narratives, has conflated AGI affiliation with fundamental technical worth. I am writing this to systematically dismantle that conflation. Answering the question requires naming some uncomfortable truths about what we are actually building verses what we are selling, but continuing to let the wrong metrics drive our careers is an error we cannot afford to sustain.
Before I develop the argument, I want to say something about the specific shape of this anxiety, because it is not random, and understanding where it comes from matters for evaluating whether it deserves the authority it has been given. The last 3 years have produced a sustained cultural pressure that says artificial general intelligence is imminently arriving, that it will make most existing engineering work obsolete, and that the only engineers who will survive the transition are those who are actively building toward it or deeply integrated with the systems that are closest to it. This pressure comes from real places, from genuine capability jumps in language models, from the economic dominance of companies like OpenAI and Anthropic and Google DeepMind, and from the completely understandable anxiety of an engineering workforce watching automation creep up the skills ladder. But it is also being amplified by people who benefit from that anxiety, venture capitalists who need the narrative of impending disruption to justify frontier investments, startup founders who need to recruit at below-market rates by selling the dream of being part of something historical, and AI companies that need a cultural context in which the only legitimate technical work is work that feeds their systems. I am not saying those people are lying. I am saying that the incentives around a narrative matter for how that narrative gets shaped, and the incentives around AGI are enormous.
In Genuine Intelligence Will Never Emerge from Neural Networks, I argued at length that the current direction of AI development, scaled statistical learning over human-generated data, is architecturally incompatible with genuine intelligence in any sense that word deserves to carry. In Knowledge and Intelligence Are Mutually Exclusive, I argued that having access to information is not the same thing as understanding it, and that the conflation of those two things is the central mistake driving both AI hype and the misplaced anxiety around it. In All You Have Access To Is Knowledge and Tools; Never Intelligence, I argued that every AI system ever deployed has given its users knowledge and tools but not intelligence, because intelligence is the thing the human brings to the interaction. All of those arguments feed directly into the question in this post's title, because if genuine intelligence is not what AI systems have, and if genuine intelligence is not what companies can buy in the form of a language model API, then the engineer who can use their own genuine intelligence to build systems that work reliably and solve real problems is still the most valuable person in any technical organization. The question is not AGI or obsolescence. The question is whether you can build things that matter.
The Question Is Designed to Make You Feel Small
Let me say this plainly, because it is the most important thing in the entire post and the thing I most wanted to say when I started writing it. The question "if you can't build AGI, why should we hire you?" is not a technical question. It is a psychological one. It is the kind of question that only functions if you already accept the frame that makes it feel threatening, and that frame is this: the only work that counts is work at the frontier of artificial general intelligence, and everything else is just maintenance work that will be automated away. That frame is wrong on multiple levels, and I want to dismantle it carefully because it has been causing real damage to real engineers who are doing genuinely valuable work but who have been made to feel that their contributions are somehow insufficient by a cultural context that has decided only one thing counts. The question is designed to make you feel small, and it has been working.
The first thing wrong with the frame is that AGI does not have a settled definition, which means you cannot be evaluated against it as a standard. OpenAI says AGI is "AI systems that are generally smarter than humans" (1). DeepMind's Morris et al. defined it as a six-level hierarchy running from Level 0 (No AI) through Level 1 (Emerging), Level 2 (Competent), Level 3 (Expert), Level 4 (Virtuoso), and Level 5 (Superhuman), assessing both the depth of performance and the breadth of tasks a system can handle (2). MIT's 2025 AI Agent Index says explicitly that definitions of AI agents are "nebulous and differ across fields" and that even the word "agent" lacks a stable technical meaning across the research community (3). This is not a minor disagreement. These are the central organizations in the field, and they cannot agree on what AGI is, whether we have it, how close we are to it, or what empirical tests would determine whether a system qualifies. When a hiring question invokes a concept that its own field cannot define, it is not measuring something real. It is invoking a feeling, and feelings are not hiring rubrics. The person asking whether you can build AGI is asking whether you are affiliated with something prestigious and frontier-sounding, which is a question about social proof, not technical capability.
The second thing wrong with the frame is that companies do not need AGI. Companies need solutions to specific problems in specific domains with specific constraints. They need systems that work reliably within a budget, that can be maintained by the people who will inherit them, that handle failure gracefully, and that produce measurable value that justifies their cost. None of these requirements are AGI requirements. A customer support routing system that correctly classifies 92% of tickets and escalates the other 8% appropriately does not need to be generally intelligent. It needs to be correct, maintainable, and deployable in the production environment the company already has. An internal knowledge retrieval system that helps a legal team find relevant case precedents faster does not need human-level general reasoning. It needs to retrieve the right documents, surface them clearly, and handle the failure modes that arise when relevant documents do not exist. These are real engineering problems, and they require real engineering skill, but they require engineering skill, not AGI-building ability. The conflation of the two is how companies end up hiring people based on narrative affiliation rather than on the actual skills the job requires.
The third thing wrong with the frame is that the people who ask the AGI question in interviews are almost never building AGI either. They are building features on top of API calls to large language models. They are building wrappers, prompts, evaluation pipelines, fine-tuning scripts, and retrieval systems. That is real work and some of it is genuinely difficult and genuinely valuable. But it is not AGI research, and calling it AGI research or demanding that candidates show AGI-building ability as a prerequisite for doing it is a form of credential inflation that benefits no one except the cultural positioning of the company doing the hiring. A frontend engineer does not need to have invented the rendering engine to build excellent interfaces. A backend engineer does not need to have written the database to design excellent schemas. The layer abstraction that makes software engineering productive depends on people using tools well without pretending to have built them from scratch. The AGI question inverts this by demanding that candidates prove mastery of a layer that does not yet exist in any usable form.
The psychological function of the question matters as much as its technical content, and I want to be honest about what that function is. The question filters out engineers who feel uncertain, who are not good at performing confidence in things they cannot verify, or who are honest about the limits of their knowledge. It selects for a certain kind of boldness that is correlated with self-promotion but not reliably correlated with technical ability. I described in An Empty Life Filled With Constant Suffering what it feels like to have genuine technical capability and not be able to perform the social confidence that the market rewards. The AGI question is a social performance test wearing a technical costume, and the engineers it selects are the engineers who are best at performing the right kind of confidence, not necessarily the ones who will build the most reliable systems or solve the hardest real-world problems. That is a bad hiring mechanism that produces bad outcomes, and I want to say so directly.
There is also a historical pattern worth noting here, because the AGI question did not emerge in a vacuum. Every period of technological transition produces a version of the question that positions the new technology as the only thing that matters and asks engineers to prove their alignment with it as the price of admission to the profession. In the early cloud era, you had to prove you understood distributed systems at scale even if you were building a simple CRUD application. In the mobile era, you had to prove you thought mobile-first even if your product was never going to be used on a phone. In the blockchain era, which I will not dwell on for the sake of everyone's blood pressure, you had to convince hiring managers that everything was about decentralization. The AGI version of this question is the latest in that sequence, and it will eventually be replaced by the next version, after enough companies have hired for AGI affiliation and discovered that what they actually needed was engineers who could ship reliable systems and communicate clearly with non-technical stakeholders. The cycle is predictable. The damage it does in the meantime is real.
The engineers who actually built the systems that powered the cloud era's most important products were not the engineers who wrote the seminal distributed systems papers. They were the engineers who read those papers, understood them well enough to apply the relevant parts to a specific problem, and built systems that worked well enough to ship and be maintained. The cloud era needed understanding of distributed systems principles, not the ability to invent new distributed systems theory from scratch. The AI era needs the same thing: engineers who understand the tools well enough to apply them appropriately, who can evaluate the outputs, who can design systems around the limitations, and who can maintain what they build as the underlying tools evolve. That is what the actual job requires. That is the hiring rubric that produces good engineering organizations. The AGI question is a distraction from that rubric, and expensive organizations have already started to figure that out.
AGI Is a Moving Target
I want to spend some time on what AGI actually means, because the term is used so loosely and so frequently in the current conversation that it has become almost content-free while retaining strong emotional charge. That combination, strong feeling, weak definition, is exactly the combination that makes a term useful for rhetorical purposes and dangerous for technical ones. When I wrote Rethinking ARC-AGI, I tried to explain why even the most prominent empirical attempt to define and measure AGI, François Chollet's ARC benchmark, was not actually measuring what it claimed to measure, because it was measuring pattern recognition under specific constraints rather than the open-ended adaptive intelligence that the AGI label implies. That argument stands, and it has gotten more support since I wrote it, not less. The definition of AGI has been drifting in the research literature not toward consensus but away from it, with each major lab defining it in a way that happens to be aligned with their own products and roadmaps.
The definitional drift is not accidental. It serves a real function in the competitive landscape of AI research and development. If your company can credibly claim to be on the path to AGI, you have an enormous advantage in recruiting, in fundraising, in media attention, and in policy influence. That creates strong incentives to define AGI in a way that your current work looks like progress toward, and weak incentives to hold to a rigorous definition that might show your current work to be further from the goal than it appears. OpenAI's current framing places AGI as "AI systems that are generally smarter than humans", which is broad enough to interpret very generously as models get more capable (1). DeepMind's Morris et al. framework defines six levels of performance and generality, from Level 1 (Emerging) up through Competent, Expert, Virtuoso, and Level 5 (Superhuman), which allows them to place current systems at the low end of the spectrum as "Emerging" AGI without committing to any claim about whether higher levels are imminent or even achievable (2). Anthropic tends to avoid the term entirely and talks about "powerful AI" instead, which is a different rhetorical choice but comes from the same place of definitional flexibility. None of these framings give you a clear, falsifiable prediction about what would count as AGI being achieved, which means none of them give you a clear, falsifiable hiring criterion.
The World Economic Forum's 2025 Future of Jobs Report is actually more useful for thinking about what the current period of AI development means for hiring than any of the lab-specific AGI framings, because it is written from the perspective of what organizational capability needs look like rather than from the perspective of what researchers hope to achieve (4). The report says that nearly 40 percent of skills required on the job are expected to change by 2030, that 63 percent of employers already cite the skills gap as a barrier to transformation, and that the skills employers are planning to grow are not exclusively technical ones. They include analytical thinking, resilience and flexibility, leadership and social influence, and creative thinking alongside AI and big data, technology literacy, and networks and cybersecurity. That is a very different picture from "hire for AGI fluency." It is a picture of organizations that need people who can think, adapt, communicate, and lead through technical change, not people who can write papers about general intelligence.
The International Labour Organization's 2025 update on generative AI and occupations makes an even more important corrective point, which is that very few jobs are fully automatable with current generative AI, and that the predominant pattern is task transformation rather than job replacement (5). That distinction matters more than almost anything else in the current conversation, because it repositions the question from "will AI replace this job?" to "which parts of this job will AI change, and what will the human do with the time freed by those changes?" The answer, in most knowledge-work domains, is that humans will do more of the high-judgment, high-context, high-stakes parts of the work, the parts that require understanding a specific organization's situation, that require building trust with specific people, and that require taking responsibility for outcomes in a way that no automated system can take. Those are the parts of the job that AGI cannot do even in principle given the current architectural limitations I described in Genuine Intelligence Will Never Emerge from Neural Networks. The jobs are changing, yes. They are not being replaced by AGI. They are being changed by tools that free up humans to do more of what humans are actually good at.
The specific claim that METR's 2025 study makes is worth examining in detail, because it is one of the most counterintuitive and important findings in the recent literature on AI and developer productivity (6). The study found that experienced open-source developers using the best available AI coding tools as of early 2025 took roughly 19 percent longer on their own repositories compared to a control condition without those tools. This is not a finding that says AI tools are useless. It is a finding that says the relationship between tool capability and productivity is not linear, and that in expert contexts with high contextual depth, the overhead of correctly prompting, reviewing, and integrating AI-generated outputs can exceed the time saved by not generating those outputs manually. That is a real phenomenon, and it has enormous implications for how organizations should think about AI tool adoption, about what skills to hire for, and about what productivity metrics to use when evaluating engineers who use these tools. The best candidate is not the one who uses AI most aggressively. The best candidate is the one who understands when to use it and when not to, what its outputs need in terms of review, and how to fit it into a workflow that makes the whole system better rather than just the individual task faster.
The definitional instability of AGI also has direct consequences for how companies should think about technical leadership and strategy. A company that builds its hiring and product strategy around hitting AGI as a milestone is building on sand, because the milestone keeps moving and no independent observer can verify when it has been reached. A company that builds its hiring and product strategy around solving specific, measurable problems for specific customers in specific domains is building on solid ground, because those goals are clear, progress toward them is verifiable, and success is recognizable when it happens. I have been saying something like this since Announcing Kevin RS, where I described the design philosophy behind a framework built for specificity and reliability rather than for generality and hype. The argument has not changed. Specificity is how you build something real. Generality is how you build something impressive-sounding.
The Turing test is the oldest version of the AGI measurement problem, and it is worth pausing on because its history illustrates exactly the issue with building hiring criteria around an underspecified goal. Alan Turing proposed the imitation game in 1950 as a way of sidestepping the hard philosophical question of machine consciousness by substituting a behavioral test (7). The insight was genuinely brilliant: instead of asking whether a machine can think, ask whether its behavior is indistinguishable from a thinking entity. But the problem that became apparent over the following seven decades is that the Turing test measures the convincingness of outputs rather than the presence of understanding behind them, and those are not the same thing. Systems that have passed versions of the Turing test under specific conditions have done so by being good at producing the right kind of text, not by being genuinely intelligent. The same problem applies to AGI definitions based on behavioral outcomes: they optimise for producing the right impressions rather than for having the right internal structure. And at the level of hiring, optimizing for impression-making is exactly the failure mode I am describing throughout this post.
What the AGI frame has done to technical culture, and this is the thing that bothers me most about it, is shift the primary evaluation criterion from "can you build something that works" to "do you sound like someone who could theoretically build something that would eventually work." That is a shift from evidence to narrative, and narrative has always been cheaper to produce and harder to evaluate than evidence. I wrote in As Engineers, LLMs Should Pay Us for Tokens Usage about how engineers are the ones who create real value in AI systems while other actors extract that value through the framing of the market. The AGI hiring question is the same extraction in a different domain: it takes the genuine capability and judgment of working engineers and subordinates it to a narrative about frontier intelligence that few people in the industry are actually building and fewer still are building honestly. The resistance I am mounting in this post is the same resistance I mount in my code: insist that the thing works, insist that you can measure it, and refuse to accept impressive language as a substitute for demonstrated capability.
The Market Is Not Broken
I have heard too many engineers describe the current job market as completely broken, and while I understand where that feeling comes from and I have felt it myself in ways I described in Technology Has Destroyed My Livelihood, I want to offer a more precise diagnosis, because "broken" suggests randomness and the current market is not random. It is re-sorting. It is re-sorting around a specific set of properties that determine who survives and who does not in an environment where AI tools have changed what can be accomplished by an individual contributor and where the ceiling on what a small team can ship has risen dramatically in a short period. Understanding the re-sorting logic is more useful than describing the whole situation as broken, because understanding it tells you what to actually do.
The primary axis of re-sorting is between engineers who can demonstrate specific, verifiable outcomes and engineers who can only describe their capability in terms of their familiarity with fashionable tools or their proximity to prestigious institutions. That distinction has always existed in engineering hiring, but it has become much more visible and consequential in the last three years because the tooling has changed what the floor for competent-sounding output looks like. An engineer who can produce a reasonably coherent architecture diagram and a working prototype using language model tools has raised the floor for what "showing up with something" looks like in a technical interview or a project pitch. That means the bar is not in the appearance of competence but in the demonstration of genuine judgment, specifically the judgment about what to build, how to constrain it, how to verify it, and how to maintain it when the first version inevitably has problems. Those things are harder to fake than they have ever been, because the fakers are now using the same tools as the practitioners and the outputs look similar from the outside, which means you have to go deeper to see the difference.
The World Economic Forum data on this is worth reading carefully rather than treating as a bumper sticker (4). The 2025 report says employers are planning to hire for AI and big data skills, but it also says they expect to grow human capabilities like resilience, flexibility, curiosity and lifelong learning, and leadership and social influence. It says 85 percent of employers plan to prioritize upskilling their existing workforce in thinking and working with AI, which means the companies that are doing this right are not replacing human judgment with AI; they are augmenting human judgment with AI and investing in the humans who use that judgment well. The organizations that are just replacing humans with AI and calling the result an AI-forward strategy are in a different category, and I will say plainly that I think most of them will discover in two to three years that they degraded their organizational capability by removing the judgment that the tools lack. That is not a blind pro-human claim. It is a claim grounded in what AI systems can and cannot do at the architectural level, and I have made that argument in enough technical detail in previous posts that I will not repeat it all here.
The re-sorting is also happening along the axis of specialization versus generalism. For most of the last decade, the engineering job market rewarded generalists who could move between stacks, adapt to new frameworks quickly, and be productive across a wide range of problem domains. That premium on generalism was driven by the rapid pace of framework and tool change, which made deep specialization in any single technology risky because the technology might be irrelevant in three years. AI tools have changed this dynamic by raising the floor for what a generalist can produce, which means the premium has shifted toward people who have genuine depth in a specific domain, because that depth is what distinguishes their use of AI tools from a shallow user's use of the same tools. The cardiologist who knows the clinical literature deeply is in a very different position when using AI diagnostic tools than the medical student who has passed the basic examinations. Both can ask the AI the same questions. Only the cardiologist can evaluate the answers against a causal model of cardiac function built from years of clinical experience. That depth is irreplaceable by scale, and domain-specific depth is what engineers should be building right now, not because it sounds prestigious but because it is what makes their use of new tools genuinely better than the competition's use of the same tools.
The international dimension of this re-sorting matters too, because engineering talent is globally distributed in a way that the hiring practices of large Western technology companies have never fully accommodated and that AI tools are now making more complicated. I have been building and thinking from outside the geographic and institutional centers of the AI industry, and I wrote in An Empty Life Filled With Constant Suffering about what it is like to have serious technical capability and no institutional affiliation, no network connection, and no runway to perform indefinite unpaid proof of work. The market re-sorting I am describing does not automatically benefit people in that position, because verifiable outcomes are still verified by networks, and networks are still unevenly distributed by geography and institution. But the tools have lowered some of the barriers to producing the evidence in the first place, which means the task is building evidence that is verifiable by someone who does not already know you, and that is exactly the kind of portfolio, the shipped product, the open-source project with real users, the measurable improvement documented in public, that replaces the network effect when you do not have one.
The thing I want to say most directly about the re-sorted market is this: the engineers who are struggling the most are not struggling because AI has made their skills obsolete. They are struggling because the market is in an adjustment period where the signals that used to identify good engineers, years of experience, recognizable employer brands, familiarity with popular frameworks, have been disrupted faster than new signals have emerged to replace them. In that vacuum, AGI affiliation has rushed in as a cultural signal, and it is doing the job poorly because it measures narrative proximity rather than engineering capability. The engineers who will come out of this period in a strong position are the ones who did not wait for the market to figure out the right signal but instead built their own evidence. They shipped things. They measured them. They wrote about what they learned. They made their judgment visible in forms that a thoughtful hiring manager in any technical organization could evaluate and recognize. That is what I am trying to do with this blog, with the projects I am building, and with the framework I am going to spend several full sections of this post describing.
The METR study I mentioned earlier has another dimension that is worth bringing out, which is that the developers who did not see productivity losses from AI tools were the ones who had developed specific workflows for using them, specific evaluation habits, specific constraints on when to accept AI-generated code and when to reject it (6). That is a skill. It is not a skill that is visible from a resume headline or easily measured in a forty-five minute technical interview. It is a craft practice that builds over time with deliberate effort, and the engineers who have it are more valuable than the engineers who can pass the interview filter on AGI familiarity but have not developed the operational judgment to use AI tools well in a production context. The hiring practices that will survive this period are the ones built around evaluating that operational judgment rather than the ones built around evaluating AGI affiliation, and the organizations that get there first will be the ones with engineering workforces that are genuinely more productive and not just nominally more AI-native.
One more thing on the market, and then I want to move to something more constructive. The layoffs that have accompanied the re-sorting are real and they have hurt real people, including people I know and people whose situations resemble things I have lived through. But the layoffs are not primarily driven by AI making engineering work unnecessary. They are primarily driven by a period of over-hiring during 2020 to 2022 that was financed by near-zero interest rates and the specific conditions of the COVID period, followed by a normalization period as interest rates rose and the cost of capital became real again. AI is a contributing factor at the margin in some specific roles, but the dominant cause of the current dislocation is interest rate normalization and valuation correction, not AGI. Saying otherwise serves the narrative interests of people who want AI to seem more disruptive than it currently is, and I have spent enough time in this series resisting narratives that serve interests other than the truth to stay consistent about it here.
Everyone Has the Same Tools
I hear this claim constantly, and I want to deal with it directly because it is one of the most consequential pieces of misinformation circulating in technical culture right now. The claim is that the democratization of AI tools has leveled the playing field, that because everyone has access to GPT-5 or Claude or Gemini through an API, the advantage that experienced engineers used to have from their accumulated skill and domain knowledge has been eroded or eliminated. This claim is wrong in a way that is both easily demonstrated and deeply consequential for how engineers should think about their own value. Access to a tool and competence with a tool are completely different things, and the speed at which access has democratized has been confused with a democratization of competence that has not actually occurred. I want to explain exactly why, because I think this is one of the places where the AI conversation has gone most seriously off track.
The easiest demonstration of the gap between access and competence is the evaluation problem. When an engineer uses an AI tool to generate a piece of code, a data analysis, a system design, or a legal summary, the tool produces output. That output may be correct, partially correct, subtly wrong, confidently wrong, or genuinely excellent. The tool itself cannot reliably tell you which of these it is, because as I argued in Genuine Intelligence Will Never Emerge from Neural Networks, these systems are optimized for producing statistically likely outputs conditional on their training data, not for producing outputs whose correctness tracks the structure of reality. The engineer who has the domain expertise to evaluate the output can use the tool on a completely different level than the engineer who does not. The expert can accept the parts that are right, identify the parts that are wrong, and either fix them or avoid accepting them in the first place. The non-expert accepts the whole thing and ships it. Both engineers have equal access to the tool. Their outputs are completely different, and the difference is entirely determined by the human knowledge the engineer brings to the interaction, not by anything the tool does differently for one versus the other.
This is also the argument I made in All You Have Access To Is Knowledge and Tools; Never Intelligence, and I want to connect it here explicitly because it goes to exactly this point. What AI systems provide is precisely knowledge extracted from their training data and tools for manipulating that knowledge in ways that match statistical patterns from that training. What they do not provide is the capacity to reason about whether their outputs are correct in the specific context of your specific problem. That capacity has to come from somewhere external to the tool, and the only available source is the engineer using it. The engineer whose domain knowledge is deep enough to evaluate AI outputs critically is multiplied in capability by the tool. The engineer whose domain knowledge is shallow enough that they cannot evaluate AI outputs is given a false sense of capability by the tool, and that false sense is dangerous in exactly the proportion that the tool's outputs are wrong without the user noticing. Equal access does not mean equal outcome. It means equal exposure to a multiplier that amplifies existing capability differences.
There is a very specific kind of knowledge that matters most for using AI tools well, and it is not the kind of knowledge that appears on a resume or that can be tested in a standard technical interview. It is the knowledge of where the tool fails. Every domain where AI tools are deployed has characteristic failure modes, places where the statistical patterns in the training data diverge from the causal structure of the real problem, where the tool produces confident-sounding but wrong outputs, where the failure is subtle enough to pass an initial review but consequential enough to produce real problems downstream. An experienced engineer in that domain knows those failure modes from having seen them, from having been bitten by them, and from having developed habits and workflows that prevent them from propagating. That knowledge is the functional difference between a dangerous AI user and a productive one, and it is hard-won from domain experience that cannot be shortcut by API access. The person who gave you the API key has not given you the decade of domain experience that makes the API safe to use on problems where being wrong has real consequences.
Let me connect this to something very concrete from my own work. When I am building an agent pipeline using our Rusty autoGPT framework, the most important decisions I make are not about which model to call or how to structure the prompts. The most important decisions are about what the agent is not allowed to do, what outputs it must have verified by a human before they are acted on, what happens when the LLM call fails or returns something that does not parse, and how the system signals uncertainty in a way that a human operator can act on appropriately. Those decisions require understanding both the specific domain of the problem and the specific failure modes of the AI components being used, and they are decisions that a user with only API access and no domain depth cannot make well. The democratization of access does not democratize those decisions. It democratizes the ability to build something that looks like it is making those decisions while actually leaving them unaddressed.
The historical analogy I find most clarifying here is the spreadsheet. The introduction of VisiCalc and then Lotus 1-2-3 and then Excel democratized access to financial modeling. Before those tools, you needed a team of accountants to build complex financial projections. After them, every manager with a computer could do it. But the democratization of access to financial modeling tools did not democratize financial modeling competence. It democratized the ability to produce financial models that looked credible without necessarily being correct. The history of corporate finance is full of disasters produced by managers who had full access to Excel and no real understanding of the financial structures they were modeling, who built confident-looking spreadsheet models that had fundamental logical errors or unrealistic assumptions baked into their structure. The accountants who were displaced from some modeling tasks did not become useless because everyone had Excel. They became more valuable as evaluators of the models that Excel made easy to build, because someone with real financial expertise was needed to tell the managers which of their models had errors they could not see. AI tools are the Excel of the current moment, and the engineers with domain expertise are the accountants. Access has shifted; expertise has not.
The argument that equal tool access produces equal outcomes also fails on the evidence of what actually happens when engineers use AI tools in practice. Empirical research on code generation tools like GitHub Copilot shows a stark disconnect between user expectation and actual experience (8). While programmers overwhelmingly prefer using AI tools because they provide useful starting points, the studies show this access does not actually improve task completion time or success rates. Instead, users struggle profoundly to understand, edit, and debug the generated code. A pattern emerges where users over-rely on the system without sufficiently validating its outputs, decreasing code quality because they fail to recognize when the AI has generated something that passes a surface syntax check but fails a deeper correctness check. The gap in outcomes is not driven by access to the tool—everyone has that. It is driven by the depth of domain knowledge the user brings to debugging and evaluating the tool's outputs. That means the right policy response to AI democratization is not "everyone has the tools, so expertise is less valuable" but exactly the reverse: "everyone has the tools, but only people with deep expertise can use them safely, so deep expertise is more valuable than before."
The frame I want to offer as a replacement for the "same tools" narrative is this: AI tools raise the ceiling on what an expert can accomplish and lower the floor on what bad work looks like. Both of those things are true simultaneously, and they have opposite implications for the value of expertise. The expert is more productive because the tool handles tedious routine work and the expert can focus on the high-judgment parts. The non-expert produces better-looking bad work because the tool adds surface polish to something that is still fundamentally wrong. From the outside, these two things look more similar than they used to, because the surface quality of the non-expert's output has improved. But the divergence in underlying correctness has not narrowed. If anything, it has widened, because the tool allows the non-expert to get further down the wrong path before the fundamental error becomes apparent. Organizations that are evaluating engineers on the surface quality of their AI-assisted output rather than on the depth of judgment behind it are making the most expensive version of the mistake I am describing, and they are going to pay for it in production incidents, in maintenance costs, and in the inevitable moment when the AI-generated system encounters a situation outside its training distribution and produces a confident wrong answer that no one on the team has the domain depth to catch.
What Actually Makes an Engineer Irreplaceable
I want to be concrete here rather than philosophical, because the philosophical dimension of this argument is well covered by the other sections and what engineers actually need is a clear sense of what they should be building and demonstrating. In my view, there are five properties that make an engineer genuinely irreplaceable in a technical organization in the current environment, and none of them are AGI-building ability and none of them are simple AI-tool fluency. They are properties that have always distinguished great engineers from merely competent ones, but they have become more visible and more economically important now because the AI tools have made the floor of apparently competent output easier to reach, which means the properties above the floor matter more for differentiation, not less.
The first property is problem selection. There is an enormous difference between an engineer who walks into a situation and builds the system that was asked for and an engineer who walks into the same situation, looks at what was asked for, and identifies the problem that actually needs to be solved, which is often different from the one being specified. Problem selection is the highest-leverage skill in engineering, because a perfect solution to the wrong problem is worse than an imperfect solution to the right one, and AI tools have no capacity for problem selection at all. They optimize for producing outputs that match the specified input, which means that if the specification is wrong, the output will be confidently wrong in a way that matches the wrong specification exactly. The engineer who asks "is this the right problem to solve before we optimize for how to solve it" is the engineer who saves organizations from building sophisticated solutions to the wrong problem, and that engineer is irreplaceable because no tool can substitute for the judgment about what problem is actually worth solving.
The second property is domain depth, which I have already discussed at length in the context of tool evaluation, but which also matters for a different reason: communication with domain experts and stakeholders who are not technical. An engineer who understands the business domain they are working in can translate technical constraints and tradeoffs into language that business stakeholders can engage with, can understand what the business stakeholders are actually asking for beneath the technical specification they have been given, and can build trust with those stakeholders through demonstrable understanding of their actual problems. This translation capacity is part of what makes engineering a business function rather than a purely technical one, and it is entirely independent of which AI tools you use. The AI can help you build the system once you understand what to build, but it cannot help you figure out what to build, because that requires the human connection between technical capability and domain understanding.
The third property is execution speed with quality, by which I mean the ability to move from specification to working system efficiently without accumulating technical debt that slows down future work. This is not the same as raw output speed. An engineer who ships a working feature quickly that is poorly structured, hard to test, and difficult to maintain has not demonstrated this property, even if the speed looked impressive in the short term. An engineer who ships a working feature at a moderate pace but with clear structure, good tests, and obvious extension points has demonstrated it. AI tools can increase raw output speed but they cannot enforce quality, and they often undermine it when used by engineers who lack the judgment to distinguish code that works from code that works well. The engineers who demonstrate this property consistently are invaluable because organizations can trust what they ship in a way they cannot trust the output of engineers whose speed is amplified by AI tools they are not yet competent to evaluate.
The fourth property is communication, and I mean this in the full sense that I described in As Engineers, LLMs Should Pay Us for Tokens Usage: the ability to create accurate shared understanding between yourself, your team, and your stakeholders about what you are building, why you are building it the way you are, what the risks are, and what the limits of the system are. This requires technical precision, the language skills to express technical ideas clearly to different audiences, and the honesty to be accurate about uncertainty and limitations rather than projecting confidence that is not warranted. The engineers who can do this well are enormously more valuable than the ones who cannot, because most of the failure modes in software development are not technical failures. They are communication failures where the engineers understood the technical situation but could not translate that understanding into shared understanding with the people who needed to make decisions based on it.
The fifth property is repeatability, by which I mean the ability to turn a one-off success into a process that can be relied upon and replicated. The engineer who solves a hard problem once is valuable. The engineer who solves a hard problem once and then documents what they learned, identifies the general pattern that the specific solution instantiated, and builds tools or processes that make future solutions to similar problems faster and more reliable is exponentially more valuable. This is the property that distinguishes artisan craftspeople from engineers in the meaningful sense, that engineers build systems while artisans solve cases. AI tools can help with individual cases efficiently, but they cannot build the institutional knowledge and repeatable processes that allow organizations to scale their technical capability beyond a small team of experts. The engineer who builds those processes is building organizational capability in a way that compounds over time, and that kind of compounding value is what justifies seniority in any technical organization.
These five properties, problem selection, domain depth, execution speed with quality, communication, and repeatability, are also the properties that are hardest to fake and easiest to evaluate by looking at a record of actual work rather than a performance in a technical interview. A portfolio of shipped systems, documented performance improvements, and written explanations of design decisions demonstrates all five of these properties in a way that passing an LeetCode-style interview cannot. I want to say directly that the shift toward portfolio-based evaluation that some technical hiring organizations have been making is a genuine improvement over the coding interview paradigm, not because coding interviews are entirely useless but because they measure a narrow technical skill disconnected from most of the properties I just described. An engineer who can pass every coding interview but produces brittle, hard-to-maintain systems is more visible and more consistently filtered out by portfolio evaluation than by interview performance. The organizations that have made this shift are making a better decision, and I predict more organizations will make it as the current tooling makes the gap between interview performance and production performance more apparent.
There is also a sixth property I almost left off the list because it feels obvious but is actually quite rare, and that is epistemic honesty. The engineer who says "I am not sure about that, let me check" when they do not know something is more valuable than the engineer who gives a confident wrong answer, because the wrong answer has to be corrected and the correction takes more time and organizational capital than the original uncertainty acknowledgment would have. The engineer who says "this system has the following limitations that you should be aware of before deploying it in this context" is more valuable than the engineer who says "it works great" and leaves the limitations to be discovered in production. The AI tools are systematically overconfident, as I described in the neural networks post, which means the engineers who compensate for that overconfidence by being more careful about acknowledging uncertainty are providing a genuine epistemic service to their organizations, not just a soft skill. Epistemic honesty is what turns AI-assisted development from a risk amplifier into a productivity amplifier, and the organizations that develop cultures where epistemic honesty is rewarded rather than punished are the ones that will capture the real value from AI tools without being destroyed by their failure modes.
Why the Rust AutoGPT Framework Changes What Is Possible
Now I want to talk about something I have been building, something I think represents a serious answer to the question of what it looks like to build real AI agent systems with the kind of reliability that the previous sections have been demanding. AutoGPT is not another Python wrapper around an OpenAI API call. It is a pure Rust framework for building autonomous AI agents, and the decision to build it in Rust rather than in Python is not an aesthetic preference. It is a principled engineering decision that reflects exactly the arguments I have been making throughout this post, about what serious builder behavior looks like when you apply genuine judgment to the question of which tools and languages serve the actual goals of production-quality agentic software.
Let me explain why Rust specifically, because the language choice matters more for agentic systems than it does for most other categories of software. An agent that operates autonomously makes decisions and takes actions without human approval at every step. That means errors in the agent's logic, memory management, or concurrency do not surface as an immediate human-visible bug and get fixed before they cause harm. They propagate through the agent's action pipeline and produce downstream consequences that may be difficult or impossible to reverse. Memory safety bugs in particular, the kind that Rust eliminates at compile time through its ownership and borrowing system, are especially dangerous in agent software because an agent that has a use-after-free vulnerability or a data race in its action execution pipeline is an agent that can take arbitrary wrong actions in ways that are not predictable from the agent's intended logic. Python does not have memory safety guarantees of this kind. Python manages memory through reference counting and a garbage collector, which means memory safety bugs of certain classes cannot occur, but Python's dynamic typing and the absence of compile-time guarantees mean that entire categories of logical and type errors that Rust catches at compile time only surface in Python at runtime, which in an agent system means they surface when the agent is running in a production environment and potentially taking irreversible actions.
The performance argument for Rust in agent systems is real and it compounds over time in ways that Python's dynamic dispatch does not allow. When you have an agent that needs to process streaming outputs from multiple LLM calls, coordinate between multiple sub-agents, manage a vector memory store, and maintain state across a long-running task, the overhead of Python's interpreter, its global interpreter lock (GIL) which prevents true parallelism in CPU-bound code, and its dynamic dispatch mechanism begin to create genuine latency and throughput limitations that no amount of async optimization can fully compensate for. Rust's zero-cost abstractions and fearless concurrency, meaning the ability to write truly parallel code that the compiler guarantees is free of data races, allow agent systems built in it to run at a fundamentally different performance tier. This matters not as a vanity metric but because latency in agent orchestration compounds: an agent that runs a ten-step pipeline where each step has additional Python overhead versus one where that overhead is absent completes meaningfully faster, and in production systems where agents are running continuously on behalf of users or organizations, that latency difference translates directly into cost, responsiveness, and the feasibility of certain deployment patterns.
The type system argument is the one I find most compelling from a pure engineering perspective. Rust's type system, with its algebraic data types, exhaustive pattern matching, and the absence of null pointers through the Option type, forces you to handle all the failure modes that Python code often leaves unhandled because handling them is optional and handling them explicitly takes more code. In an agent system, the failure modes are numerous and they interact: an LLM call can fail with a network error, or return a response that does not parse, or return a parseable response that does not have the expected structural properties, or return a structurally valid response that is semantically wrong in a domain-specific way. Handling these failure modes in Python typically means adding defensive code that can be omitted, that is more verbose than it looks from the happy path, and that tends to drift out of sync with the actual error conditions over time. Handling them in Rust is not optional. The type system enforces exhaustive handling of error variants, which means the wiring between agent steps in the framework fails to compile rather than failing at runtime if you have not addressed a failure mode. That compile-time enforcement is the difference between a system that handles errors when the developer remembered to add error handling and a system that handles errors because the language itself requires it before the code can be executed.
Our AutoGPT Rust framework's architecture itself reflects these principles in its design. The framework's agent core is built from composable components including tools and sensors that interface with the real world via actions and perceptions, memory and knowledge that combines long-term vector memory with structured knowledge bases for reasoning and recall, a planner and goals module that breaks down complex tasks into subgoals and tracks progress dynamically, and a self-reflection module for introspection that can debug, adapt, or evolve internal strategies. The Mixture of Providers (MoP) feature, which allows parallel fan-out and weighted scoring across multiple AI backends, is exactly the kind of architecture that is far easier to implement correctly in Rust than in Python, because the parallel execution and the score aggregation each have concurrency requirements that Rust's ownership system makes easy to reason about and that Python's GIL and threading model make genuinely difficult to get right without data races or deadlocks. The YAML-based no-code agent configuration is a deliberate choice to make the framework accessible to builders who understand agents conceptually but who should not have to modify the orchestration layer to define what a specific agent does and how it behaves.
The comparison with the Python-based AutoGPT framework that has received enormous attention since 2023 is instructive precisely because both projects share a name and a general concept, autonomous AI agents that can break down tasks, use tools, and self-direct toward a goal, while representing radically different engineering approaches. The Python AutoGPT accumulated users and attention quickly because Python is the lingua franca of the AI tooling ecosystem and the barrier to getting started is very low. But it also accumulated a reputation for unreliability, for being difficult to deploy in production, and for the kinds of runtime errors that only surface in production because Python's dynamic type system cannot catch them earlier. Memory management in long-running Python agent processes is a known challenge because Python's garbage collector is not designed for the patterns of intense, short-lived object creation and destruction that characterize LLM-integrated agent pipelines. The Rusty AutoGPT framework makes the opposite tradeoff: a steeper initial learning curve in exchange for a system that the compiler has vetted for safety and efficiency before it runs a single line of agent logic.
The four operating modes that the framework supports, interactive (GenericGPT), direct prompt, standalone agentic, and distributed agentic (orchestrated), represent a thoughtful progression from simple to complex that allows builders to start with something understandable and scale to something powerful without abandoning the framework. The interactive mode, where the agent responds to individual prompts with appropriate tool use, is the right starting point for understanding what the agent architecture provides over a direct API call. The standalone agentic mode, where the agent pursues a multi-step goal without human input at each step, is where the real architectural value of the framework becomes apparent, because this is where the difference between a well-typed, memory-safe orchestration layer and a dynamically-typed one starts producing measurable outcomes in system reliability. The orchestrated mode, where multiple agents form a network coordinated by an orchestrator, is where the true frontier of agent system design lives currently, and where the concurrency guarantees of Rust provide the most decisive architectural advantage over Python-based alternatives.
There are nine built-in specialized autonomous agents in the current release, which is not a large number but it is enough to demonstrate the pattern of what a specialized, reliable agent looks like in practice. Specialization matters here for the same reason I argued earlier in this post that domain depth matters for engineers: a specialized agent has a narrower scope of action, a more clearly defined set of tools it is allowed to use, and a more specific definition of what success looks like. Those constraints make it easier to evaluate the agent's behavior, easier to catch failures before they propagate, and easier to trust the system in a production context where reliability is not optional. A general-purpose agent that can do anything is impressive in a demo and dangerous in a production system, because the broader the scope of action, the more ways there are to fail, and the harder it is to predict or constrain. The framework's emphasis on specialized agents is an engineering philosophy, not a limitation, and it is the right philosophy for the current state of agent technology.
Building Something Narrow, Deep, and Real
The most important practical insight I want to leave in this post is also the simplest one: the best systems are not the most general ones. The best systems are the ones that do the right specific thing reliably enough to be trusted and deployed in the specific context where they are actually useful. I have been saying versions of this across multiple posts in this blog, in LLMs Are Useful. LMMs Will Break Reality, where I argued that the specific, constrained use of language model tools is where their value is most cleanly realized, and in Rethinking ARC-AGI, where I argued that narrow competence is more demonstrable and more verifiable than general intelligence and therefore more reliable as a basis for building systems. The principle holds even more strongly when you are building agentic systems, because agents that can do anything are agents that can do anything wrong, and broader scope without proportionally better judgment is just a larger blast radius for failures.
What does it actually look like to build something narrow, deep, and real? Let me be concrete, because abstractly advocating for narrowness is easy and the real discipline is in the specifics. A narrow agent for a legal firm's document review process might be designed to do exactly one thing: classify incoming discovery documents into one of five predefined categories, flag documents that have characteristics associated with privilege in the firm's specific practice area, and produce a structured output for each document that includes the classification, the flagging decision, and the three most relevant passages that support each decision. That is a narrow scope. The agent knows what it is for, what it is not for, what tools it can use and what it cannot, what the output structure must be, and what happens when the classification confidence is below a threshold. Every one of those constraints is a design decision that requires domain expertise. The person who designed the five categories had to understand discovery practice well enough to make the categories meaningful and mutually exclusive. The person who defined privilege characteristics had to understand the firm's specific practice area. The person who designed the confidence threshold had to understand how reviewers actually use the tool and what the cost of a false flag is compared to the cost of a miss.
None of those design decisions are made by the agent. All of them are made by the engineer who builds it. The agent executes the design with speed and consistency that a human reviewer cannot match. The engineer is the one who guarantees that the design is correct, that the constraints are the right ones, and that the failure modes are handled appropriately. That division of labor is exactly right, and it is the division of labor that serious agent system design should be preserving and strengthening rather than trying to collapse into a single general agent that does everything and guarantees nothing. The Rust AutoGPT framework is built around exactly this principle: agents are composable, specialized, and defined through declarative YAML configurations that make the scope of action explicit and auditable. You do not have to modify the orchestration layer to define what an agent does. You describe the agent's persona, its goals, its tools, and its constraints in a configuration file, and the framework executes that description with the reliability guarantees that Rust provides. That is how serious builders work: they specify the system's behavior explicitly and rely on the runtime to execute it reliably.
The audibility of agent behavior is the piece that separates serious deployments from demo-ware, and I want to spend some time on it because it is systematically undervalued in the current cultural conversation around agents. An audible system is one where you can look at exactly what the agent decided, why it decided it based on what inputs, and what actions it took in response. Without that audibility, you cannot diagnose failures, you cannot demonstrate compliance with regulatory requirements, you cannot build organizational trust in the system, and you cannot improve the system over time in a principled way because you do not have a clear record of where it is currently failing and how. The agent index research found that many deployed agent systems lack transparent documentation of their safety-relevant behavior, which is a polite way of saying that many deployed agents are black boxes and nobody can fully explain why they do what they do in production (3). That is a state of affairs that should be unacceptable in any system that is making decisions with real consequences, and it is a state of affairs that serious builders should be actively refusing to accept as normal.
The question of what a human must approve before an agent acts is the most important design question in any agentic system, and it is almost never answered well in the systems I see described in conference talks and blog posts. The instinct is to minimize human involvement because human involvement reduces the speed and autonomy that make agents valuable in the first place. But the right answer is not "minimize human involvement" as a goal. The right answer is "identify carefully which decisions are reversible and which are not, which failure modes are survivable and which are not, and require human approval for everything that is irreversible or unsurvivable." Sending an internal status update is reversible if you send the wrong one. Sending a message to a customer is harder to reverse. Executing a financial transaction is extremely difficult to reverse. Deleting data is often impossible to reverse. The level of human oversight required before an action should scale with the irreversibility and consequence of the action, not with the organizational desire to minimize the number of clicks. Any agent framework that does not make this distinction explicit and allow builders to implement it cleanly is an agent framework that is making deployment easier and safety harder.
The lmm project I described in Mathematical Equations Are Multimodal by Default and Training Is an Evil Concept. LMMs Eliminates it Altogether is the companion to the autogpt framework in the Wise AI ecosystem: it is a parallel agent intelligence project that does not use LLMs at all, instead using equation-based intelligence to reason without gradient-trained models. The two projects represent different arms of the same commitment, which is to build AI systems that are honest about what they are, explicit about what they can and cannot do, and designed from the ground up for the kind of reliability that production environments require rather than for the kind of impressiveness that conference demos require. I am not advocating for abandoning LLMs as a tool. I am advocating for putting them in a framework that respects their limitations, compensates for their failure modes, and makes their behavior audible and controllable. That is what serious building looks like, and it is the opposite of building a general autonomous agent and hoping for the best.
The measurement problem is the last thing I want to address in this section, and it is important enough that I want to be explicit about it rather than leaving it implicit. A system you cannot measure is a system you cannot improve, and a system you cannot improve is a system that degrades over time as the distribution of real-world inputs drifts from the distribution the system was designed for. Every serious agent deployment needs explicit metrics for what success looks like, explicit mechanisms for capturing the data needed to compute those metrics at production scale, explicit alert conditions for when the metrics drop below acceptable thresholds, and explicit processes for reviewing agent decisions when the metrics indicate a potential failure. Those are not nice-to-haves. They are the minimum requirements for a system that you can honestly describe as deployed rather than as a demo that happens to be running in production. Building these measurement systems requires exactly the kind of engineering judgment I described in the previous section: problem selection, domain depth, execution with quality, and the ability to design something repeatable. None of it requires AGI. All of it requires a serious engineer.
The Unasked Question Beneath the Provocative One
I want to spend some time on what I think the people asking the AGI question are actually worried about, because beneath the provocative framing there is a real anxiety that deserves a real answer rather than just a demolition of the bad frame. The real anxiety is this: in an environment where AI tools are becoming dramatically more capable rapidly, is the skill base I have built still going to matter in three years? That is a legitimate question, and it is being asked by engineers who have spent years building expertise and who are genuinely uncertain whether that expertise will remain valuable as the tools get better. I want to answer it honestly, which means I have to say both the reassuring and the uncomfortable parts, because partial answers to this kind of question do more harm than no answer at all.
The reassuring part is what I have argued throughout this post: the skills that make engineers genuinely valuable, problem selection, domain depth, evaluation judgment, clear communication, and the ability to build repeatable systems, are not the skills that AI tools replace. They are the skills that AI tools amplify for people who have them and expose as absent in people who lack them. The domain expert who can evaluate AI outputs correctly is more valuable than the domain expert who could not, because they can now get the routine generation work done faster and spend more of their time on the high-judgment parts. The engineer who can design a reliable agentic system and make its behavior audible is more valuable than the engineer who could not, because agentic systems that work reliably are enormously consequential in any organization that deploys them. The person who can communicate clearly about what the system does, what it does not do, and where the risks are is more valuable than before for the same reason: the stakes of misunderstanding are higher when the systems are more capable.
The uncomfortable part is also real, and I would be lying if I left it out. There are categories of engineering work that AI tools are genuinely replacing rather than amplifying, and the engineers who have built their careers entirely within those categories are in a harder position. Routine code generation, boilerplate, and templating tasks: tools do this faster and often just as well as humans. Standard data pipeline construction using well-documented frameworks: tools can scaffold this quickly from specifications. Documentation of code that is being written by humans or tools simultaneously: tools are quite good at this. The pattern across these categories is that they are tasks where the domain knowledge required to do the task is well represented in training data, the output is easily verifiable by a human with less expertise than is required to produce it, and the task is well-enough defined that a statistical approximation of the standard solution is often the right answer. If your career is primarily composed of tasks in these categories, the tools are a genuine displacement and the right response is to build the domain depth and judgment skills that sit above those tasks in the value chain, not to compete with tools at what tools do well.
The specific career trajectory I would recommend to an engineer navigating this transition is something like this: pick a domain, not a framework and not a tool, a domain, meaning an area of substantive human activity that software systems help with, and invest deliberately in understanding that domain at a depth that allows you to evaluate AI outputs about it critically. Pick a domain where the consequences of wrong AI outputs are significant enough that evaluation expertise is genuinely valuable, because the premium on evaluation expertise is proportional to the cost of unchecked AI errors. Build systems in that domain that are narrow, well-specified, measurable, and auditable. Document what you build and what you learn from deploying it. Make your judgment visible by writing about the design decisions you made and why, about the failure modes you encountered and how you addressed them, and about the tradeoffs you navigated that a less thoughtful builder would have ignored. That portfolio of documented judgment is the thing that differentiates you from someone who has the same tool access and less domain depth, and it is the thing that a thoughtful hiring organization can evaluate without relying on AGI affiliation as a proxy.
I came to this conclusion through a lot of painful experience that I have documented in various posts throughout this blog. I built systems for clients who paid me almost nothing and then used what I built to raise funding and launch products. I applied for roles where I was told I was overqualified or underexperienced, depending on which angle served the rejection better. I watched people with better institutional affiliations and less technical depth get opportunities that I could not access. All of that is in Just Don't Pick Up the Brush and Technology Has Destroyed My Livelihood and An Empty Life Filled With Constant Suffering in more detail than I usually allow myself to write in. What I am describing in this post as a career strategy is not something I arrived at from a position of professional comfort. It is what I am actually doing, with real stakes, because I believe it is the right answer and I have examined the alternative answers carefully enough to know why they are wrong.
The specific thing that the autogpt Rust framework represents in this context is not just a technical tool for building agents. It is a demonstration of what it looks like to apply genuine engineering judgment to a problem that the broader market is solving with less rigor. Building a production-quality agent framework in Rust, with its steeper learning curve and its compile-time discipline, when Python frameworks with lower barriers to entry are available, is an argument made in code about what the actual requirements of reliable agentic systems are. It is the same kind of argument that serious builders make implicitly every time they choose the harder right thing over the easier wrong thing: the choice itself demonstrates the judgment. And demonstrating judgment through choices is how you build a portfolio that communicates something real to someone who knows what they are looking at, which is the only kind of communication that matters when you are trying to prove your value without a prestigious institution behind you.
If You Can't Build AGI, Then Why Should We Hire You?
I want to end where the title began, because that question deserves a direct answer rather than just a destruction of the frame it comes from. I have spent several thousand words explaining why the AGI criterion is the wrong criterion, why the market is re-sorting around different properties, and why those properties are available to anyone willing to develop them regardless of their institutional affiliation. But the person who asked the title question in a real interview deserves a real answer that does not feel like a lecture about epistemology of AI. So here is the answer I would give, and I want it to be honest and direct rather than polished and strategic.
If you cannot build AGI, then you should be hired because you can build something that works. Working, in this context, means several specific things that I have laid out across this post. It means the system does the specific thing it was designed to do reliably enough to be trusted in the context where it will be used. It means the system fails gracefully and predictably rather than confidently and silently. It means the system is understandable by the people who will maintain it and auditable by the people who will be accountable for its behavior. It means the system can be measured, and when it degrades, the degradation is visible before it becomes catastrophic. Those properties are not glamorous. They do not make good conference talk titles. They do not generate the same kind of cultural excitement that AGI building ability generates. But they are what separates systems that create real value from systems that create impressive demonstrations, and the gap between those two categories is enormous.
I also want to say something about honesty as a hiring criterion, because it connects to everything I have been arguing in this post and I do not want to leave it implicit. The engineers who are most valuable to organizations are the ones who can be trusted to say true things, not just impressive things. The engineer who says "this will work for these cases and fail for these other cases, and here is how I would handle the failure cases" is more valuable than the engineer who says "this will work" and leaves the failure cases to be discovered in production. The engineer who says "I don't know how to do this, but here is how I would find out and here is how long I think that would take" is more valuable than the engineer who says "sure I can do that" and then figures it out badly under time pressure. The cultural pressure toward AGI affiliation is a cultural pressure toward a specific kind of impressive-sounding dishonesty, toward overpromising against an underspecified goal, and the engineers who resist that pressure and maintain epistemic honesty are doing something more valuable than building a better demo.
The connection to Be Aware of the Current UFOs Pandemic. Remember, We Are Alone. is not superficial, and I want to make it explicit because it is one of the things I have been thinking about throughout writing this post. In that piece, I argued that the UFO panic is not primarily about UFOs. It is about the human need to believe that something outside our current situation, something powerful and superintelligent, is watching and might intervene. I argued there that the silence of the universe is clarifying rather than hopeless, because it places the full responsibility for our situation on us rather than on a savior from outside. The AGI panic is a different version of the same psychological pattern. It positions AGI as the coming superintelligence that will arrive and change everything, which creates both the fear of being made obsolete and the hope of being on the right side of the transition by being among those who summon it. Both the fear and the hope are responses to the same underlying fact: we are responsible for the systems we build, there is no superintelligence coming to take that responsibility off us, and the quality of what we build depends entirely on the quality of the judgment we bring to building it. That is both terrifying and clarifying, exactly as I said about the silence of the universe.
I am building autogpt in Rust and the companion lmm project without gradient-based training because I want to build things that are honest about what they are. The autogpt framework makes explicit what the agent is allowed to do, through its composable architecture and YAML configurations. The lmm project is an attempt to build machine intelligence that discovers the structure of reality rather than statistically approximating the structure of human text about reality. Neither of these projects is building AGI. They are both trying to build systems that are genuinely useful for the specific things they do, genuinely reliable in the specific contexts they are used, and genuinely honest about the gap between what they do and what an intelligent human observer would do in the same situation. That is the standard I hold my own work to, and it is the standard I am advocating for in this post: not "can you build AGI" but "can you build something real that you can be honest about".
So if you ask me, in an interview or anywhere else, "if you can't build AGI, then why should we hire you", I will answer like this. I should be hired because I can identify the problem that is actually worth solving in the situation you are in. I can build a system that solves that problem reliably within the constraints that actually exist, not the constraints that would exist in an ideal world. I can measure whether it is working and tell you honestly when it is not. I can maintain it and evolve it as the situation changes without accumulating technical debt that forces a rewrite in eighteen months. I can explain what I built and why to people who need to understand it in order to use it or govern it or improve it. And I can do all of this without pretending that what I built is more than it is, because pretending is expensive and honesty is the only engineering practice that compounds in the direction of reliability rather than in the direction of eventual disaster. That is not AGI. That is the actual job. And that is why you should hire me.
Till next time 👋!
References
1. Altman, S., Planning for AGI and Beyond, OpenAI Blog, February 24, 2023. openai.com
2. Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C. & Legg, S., Levels of AGI for Operationalizing Progress on the Path to AGI, arXiv:2311.02462, November 2023. arXiv:2311.02462
3. Kapoor, S. et al., The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems, MIT AI Agent Index, February, 2026. aiagentindex.mit.edu
4. World Economic Forum, Future of Jobs Report 2025, January 7, 2025. weforum.org
5. International Labour Organization, How might generative AI impact different occupations?, ILO Research Article, May 20, 2025. ilo.org
6. METR Research, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, METR Blog, July 10, 2025; companion paper arXiv:2507.09089. metr.org
7. Turing, A. M., Computing Machinery and Intelligence, Mind, Vol. 59, No. 236, pp. 433-460, October 1950. doi.org/10.1093/mind/LIX.236.433
8. Vaithilingam, P., Zhang, T. & Glassman, E. L., Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models, ACM CHI '22 Extended Abstracts, April 2022. doi.org/10.1145/3491101.3519665