David Monnerat

Dad. Husband. Product + AI. Generalist. Endlessly Curious.

Tag: machine learning

  • Automation’s Hidden Effort

    Automation’s Hidden Effort

    In the early 2000s, as the dot-com bubble burst, I found myself without an assignment as a software development consultant. My firm, scrambling to keep people employed, placed me in an unexpected role: a hardware testing lab at a telecommunications company.

    dm automation hidden effort test cable box telecommunications

    The lab tested cable boxes and was the last line of defense before new devices and software were released to customers. These tests consisted of following steps in a script tracked in Microsoft Excel to validate different features and functionality and then marking the row with an “x” in the “Pass” or “Fail” column.

    A few days into the job, I noticed that, after they had completed a test script, some of my colleagues would painstakingly count the “x” in each column and then populate the summary at the end of the spreadsheet.

    “You know, Excel can do that for you, right?” I offered, only to be met with blank stares.

    “Watch.”

    I showed them how to use simple formulas to tally results and then added conditional formatting to highlight failed steps automatically. These small tweaks eliminated tedious manual work, freeing testers to focus on more valuable tasks.

    That small win led to a bigger challenge. My manager handed me an unopened box of equipment—an automated testing system that no one had set up.

    “You know how to write code,” he said. “See if you can do something with that.”

    Inside were a computer, a video capture card, an IR transmitter, and an automation suite for running scripts written in C. My first script followed the “happy path,” assuming everything worked perfectly. It ran smoothly—until it didn’t. When an IR signal was missed, the entire test derailed, failing step after step.

    To fix it, I added verification steps after every command. If the expected screen didn’t appear, the script would retry or report a failure. Over weeks of experimentation, I built a system that ran core regression tests automatically, flagged exceptions, and generated reports.

    When I showed my manager the result, he was amazed as he watched the screen. As if by magic, the cable box navigated to different screens and tested various actions. At the end of the demo, he was impressed and directed me to automate more tests.

    What he didn’t see in the demo was the effort behind the scenes—the constant tweaking, exception handling, and fine-tuning to account for the messy realities of real-world systems.

    The polished demo sent a simple message:

    Automation is here. No manual effort is needed.

    But that wasn’t the whole story. Automation, while transformative, is rarely as effortless as it appears.

    Operator: Automation’s New Chapter

    The lessons I learned in that testing lab feel eerily relevant today.

    In January 2025, OpenAI released Operator. According to OpenAI1:

    Operator is a research preview of an agent that can go to the web to perform tasks for you. It can automate various tasks—like filling out forms, booking travel, or even creating memes—by remotely interacting with a web browser much as a person would, via mouse clicks, scrolling, and typing.

    When I saw OpenAI’s announcement, I had déjà vu. Over 20 years ago, I built automation scripts to mimic how customers interacted with cable boxes—sending commands, verifying responses, and handling exceptions. It seemed simple in theory but was anything but in practice.

    Now, AI tools like Operator promise to navigate the web “just like a person,” and history is repeating itself. The demo makes automation look seamless, much like mine did years ago. The implicit message is the same:

    Automation is here. No manual effort is needed.

    But if my experience in test automation taught me anything, it’s that a smooth demo hides a much messier reality.

    The Hidden Complexity of Automation

    automations hidden effort ai machine learning operator

    At a high level, Operator achieves something conceptually similar to what I built for the test lab—but with modern machine learning. Instead of writing scripts in C, it combines large language models with vision-based recognition to interpret web pages and perform actions. It’s a powerful advancement.

    However, the fundamental challenge remains: the real world is unpredictable.

    In my cable box testing days, the obstacles were largely technological. The environment was controlled, the navigation structure was fixed, and yet automation still required extensive validation steps, exception handling, and endless adjustments to account for inconsistencies.

    With Operator, the automation stack is more advanced, but the execution environment—the web—is far less predictable. Websites are inconsistent. Navigation is not standardized. Pages change layouts frequently, breaking automated workflows. Worse, many sites actively fight automation with CAPTCHAs2, anti-bot measures, and dynamic content loading. While automation tools like Operator try to solve these anti-bot techniques, their effectiveness and ethics are still debatable.3,4

    The result is another flashy demo in a controlled environment with a much more “brittle and occasionally erratic”5 behavior in the wild.

    The problem isn’t the technology itself—it’s the assumption that automation is effortless.

    A Demo Is Not Reality

    Like my manager, who saw a smooth test automation demo and assumed we could apply it to every test, many will see the Operator demo and believe AI agents are ready to replace manual effort for every use case.

    dm automation test hidden effort operator

    The question isn’t whether Operator can automate tasks—it clearly can. But the real challenge isn’t innovation—it’s the misalignment between expectations and the realities of implementation.

    Real-world implementation is messy. Moving beyond controlled conditions, you run into exceptions, edge cases, and failure modes requiring human intervention. It isn’t clear if companies understand the investment required to make automation work in the real world. Without that effort, automation promises will remain just that—promises.

    Many companies don’t fail at automation because the tools don’t work—they fail because they get distracted by the illusion of effortless automation. Without investment in infrastructure, data, and disciplined execution, agents like Operator won’t just fail to deliver results—they’ll pull focus away from the work that matters.

    1. https://help.openai.com/en/articles/10421097-operator
      ↩︎
    2. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a security feature used on websites to differentiate between human users and bots. It typically involves challenges like identifying distorted text, selecting specific objects in images, solving simple math problems, or checking a box (“I’m not a robot”). ↩︎
    3. https://www.verdict.co.uk/captcha-recaptcha-bot-detection-ethics/?cf-view ↩︎
    4. https://hackernoon.com/openais-operator-vs-captchas-whos-winning ↩︎
    5. https://www.nytimes.com/2025/02/01/technology/openai-operator-agent.html ↩︎

  • It’s Agentic! (Boogie Woogie, Woogie)

    It’s Agentic! (Boogie Woogie, Woogie)

    You can’t see it
    It’s electric boogie woogie, woogie
    You gotta feel it
    It’s electric boogie woogie, woogie
    Ooooh, it’s shocking
    It’s electric boogie woogie, woogie

    Marcia Griffiths

    Just as the shininess of generative AI started to lose its polish from the reality of trying to use it, a new buzzword has re-entered the lexicon: agentic.

    The term “agentic” refers to AI systems that exhibit autonomy and goal-directed behavior characteristics. These systems go beyond simply generating responses or content based on input—they act as agents capable of making decisions, taking actions, and adapting to achieve specific objectives in dynamic environments.

    The concept of autonomous agents is not new. We’re bringing back the classics. In Artificial Intelligence: A Modern Approach (1995), Peter Norvig and Stuart Russell defined an agent as anything that perceives its environment and acts upon it.

    The idea that we can automate routine tasks to free up people to do more challenging, more creative tasks is noble and has been a rallying cry of computers and software since their inception. In his essay As We May Think,” Vannevar Bush imagined a device called the “Memex” to help humans store, organize, and retrieve information efficiently, aiming to reduce mental drudgery and aid creativity.

    Computers were first used to automate repetitive, time-consuming industrial tasks, especially in manufacturing. Early pioneers recognized that this freed humans for more complex supervisory roles.

    As computers and software became more accessible, researchers explored “expert systems” that were designed to take over repetitive knowledge-based tasks to allow professionals to focus on more challenging problems.

    Today, generative AI tools like ChatGPT, Github Copilot, and others are attempting to fully realize this concept by automating tasks like writing, coding, design, and data analysis, allowing humans to concentrate on strategy, creativity, and innovation.

    But since the mainstream generative AI boom in 2022, which saw the public availability of ChatGPT and Github Copilot, and the expansion in 2023 and 2024 with more competition in generative AI, the challenges of bringing this technology into the enterprise and driving meaningful value have dampened the early enthusiasm.

    That’s not to say that generative AI hasn’t been valuable. Many enterprises report productivity gains1,2 from early use cases like code generation, knowledge management, content generation, and marketing. However, several challenges have made it more difficult to scale and adopt generative AI more broadly.

    Hallucinations, where outputs are factually incorrect, fabricated, or nonsensical, despite appearing plausible or confident, introduce risk and distrust into generative AI solutions. Toxic and harmful language, including encouragement of self-harm and suicide3,4, further expose companies and reduce their interest in exposing their customers directly to generative AI output.

    Introducing generative AI has also highlighted systemic internal issues. Knowledge management use cases were seen as straightforward, low-risk ways to leverage generative AI. For example, retrieval-augmented generation (RAG) allows users to get context-aware answers by combining AI-generated content with real-time retrieval of relevant information from sources like internal documents and databases.

    But what happens when that documentation is missing, outdated, or incorrect? What if the documentation is ambiguous or contradicts itself? What if the documentation is not in a format easily consumed by the RAG system? These generative AI solutions are not a technical solution to poor documentation and data. As the saying goes, “Garbage in. Garbage out.”

    While the challenges above are relevant to most generative AI implementations, one that applies to agentic AI and agents relate to business processes.

    Agentic AI relies on well-defined tasks, workflows, goals, and a clear understanding of how processes operate to function effectively. If processes are unclear or undocumented and data is inconsistent, incomplete, or unavailable, the AI may struggle to execute tasks properly or optimize workflows.

    Generative AI and agents are not a technical solution to inefficient processes, just as AI isn’t a solution to bad data. Automating an inefficient process with an agent could reinforce and scale those inefficiencies, creating more bottlenecks or errors.

    Companies often prefer introducing new technology rather than spending resources updating outdated documentation or optimizing processes. These challenges highlight the risks of skipping those steps, especially when agents can execute transactions automatically and interact directly with customers. The possibilities of financial and reputational damage by a wayward agent are dangerously real.

    However, the lure of automation and operational efficiency is strong, and the landscape of offerings in the agentic AI space continues to grow. In 2024, the market size was estimated to be between $5 B and $31 B5,6. By 2032, the market is projected to reach approximately $48.5 B7. Like winning the lottery, that dollar figure is causing companies to forget the struggles of implementing non-agentic generative AI in pursuit of a big payoff through automation. But what opportunities are missed to improve business and customer outcomes without agents (or even AI) while chasing that payoff?

    That’s not to say there isn’t a place for agentic AI. Similar to the gains seen from generative AI, the ecosystem of conversational, natural language, multi-model, and adaptive agents can be a powerful tool to solve complex problems and drive value. However, it will take time because work must be done before this value can be fully realized. Paraphrasing a quote, the road to generative AI (and agentic AI) is clearer than ever before, but it’s much longer than we thought.

    Recommendations

    While we travel that road to the promised land, there are a few areas companies can focus on to prepare for an agentic world:

    Invest in documentation management and data quality. If previous AI projects failed due to poor documentation or data, an AI agent will likely have the same fate as its predecessor. Companies may see incremental gains through this effort because it’s likely that poor data and documentation are creating inefficiencies. For example, poor documentation can cause support agents to struggle to find answers, causing longer handling times.

    Invest in process optimization. The simpler a process is, the more likely it can be automated. I’ve found that companies want to keep their complex processes, which humans often find challenging to navigate, and think that automating them is a faster path to efficiency gains. The reality, however, is that complex processes have a long tail of edge cases that cause automation to break down, require extensive troubleshooting and tuning, and cancel out value.

    Simplify architectures and APIs. One aspect of autonomy for agentic AI is access to tools and functions that the agent can execute to act. An agent cannot effectively utilize complex APIs that wrap multiple functions and are not well-instrumented.

    Focus on risk mitigation. As mentioned above, generative AI and agentic AI introduce risks, including hallucinations, toxic and harmful language, and a lack of oversight and controls. If the best time to plant a tree is 20 years ago, the best time to implement guardrails and controls is before introducing agents. As business processes are reviewed, optimized, and documented, attention should be paid to identifying and securing vulnerable points.

    Identify small use cases with low risk and high value. It can be tempting to throw agentic AI at the biggest or most expensive problem to maximize the return on investment. However, starting with complex, high-stakes processes increases the likelihood of errors, inefficiencies, and stakeholder resistance. Instead, focus on areas where agentic AI can deliver quick wins. This approach allows teams to refine their understanding of the technology, build trust, and develop best practices before scaling to more critical or complex use cases.

    Consider non-agentic and non-AI solutions. Of all the recommendations, this one will likely generate the most resistance as companies are pushing towards the promise of generative AI and agentic AI solving all problems. Improving customer service or reducing call volume through better internal documentation or website search won’t generate enough buzz to show up in a news feed. There is so much pressure to find problems that can be solved with generative AI, forcing a solution-first, technology-first mindset. Ultimately, it should never be about the technology. It should be about the outcomes. It should be about improving our customers’ and employees’ lives and experiences and the value we bring to the business. Start with a problem or pain point and work backward. Consider all possible solutions, and choose the one most likely to succeed, even if it doesn’t feed the hype.

    Conclusion

    While the allure of agentic AI is undeniable, achieving its promised potential requires deliberate preparation, thoughtful execution, and a focus on foundational improvements.

    Companies must resist the urge to chase the hype and prioritize efforts that enhance data quality, streamline processes, and establish robust risk mitigation strategies.

    Starting with low-risk, high-value use cases can build momentum, trust, and a clear path to scalable adoption. At the same time, leaders should remain open to non-agentic and non-AI solutions that more effectively and sustainably address pain points.

    Ultimately, the goal should not be to implement the latest technology for its own sake but to deliver meaningful outcomes that enhance customer experiences, empower employees, and drive long-term business value.

    The journey toward agentic AI may be longer and more complex than we thought, but with the right approach, we can significantly increase the likelihood of realizing its full value.

    You can’t see it. You gotta feel it. Ooooh, it’s shocking. It’s agentic.

    Footnotes

    1. https://cloud.google.com/resources/roi-of-generative-ai ↩︎
    2. https://www.wsj.com/articles/its-time-for-ai-to-start-making-money-for-businesses-can-it-b476c754 ↩︎
    3. https://gemini.google.com/share/6d141b742a13 ↩︎
    4. https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0 ↩︎
    5. https://www.emergenresearch.com/industry-report/agentic-artificial-intelligence-market ↩︎
    6. The wide range in the 2024 figure is likely due to differing methodologies or market definitions. ↩︎
    7. https://dataintelo.com/report/agentic-ai-market ↩︎
  • The Humanity In Artificial Intelligence

    The Humanity In Artificial Intelligence

    I wrote this essay in 2017. When I restarted the blog, I removed the posts that had already been published. But after reading this one, while the technology has advanced significantly since then, the sentiment still applies today.

    Dave, January 2025


    Algorithms, artificial intelligence, and machine learning are not new concepts. But they are finding new applications. Wherever there is data, engineers are building systems to make sense of that data. Wherever there is an opportunity for a machine to make a decision, engineers are building it. It could be for simple, low-risk decisions to free up a human to make a more complicated decision. Or it could be because there is too much data for a human to decide. Data-driven algorithms are making more decisions in many areas of our lives.

    Algorithms already decide what search results we see. They determine our driving routes or assign us the closest Lyft, and soon, they will enable self-driving cars and other autonomous vehicles. They’re matching job candidates with applicants. They recommend the next movie you should watch or the product you should buy. They’re figuring out which houses to show you and whether you can pay the mortgage. The more data we feed them, the more they learn about us, and they are getting better at judging our mood and intention to predict our behavior.

    I’ve been thinking a lot about these systems lately. My son has epilepsy, and I’m working on a project to gauge the sentiment towards epilepsy on social media. I’m scraping epilepsy-related tweets from Twitter and feeding them to a sentiment analyzer. The system calculates a score representing whether an opinion is positive, negative, or neutral.

    Companies already use sentiment analysis to understand their customers’ relationships. They analyze reviews and social media mentions to measure the effectiveness of an ad. They can inspect negative comments and find ways to improve a product. They can also see when a public relations incident turns against them.

    For the epilepsy project, my initial goal was to track sentiment over time. I wanted to see why people were using Twitter to discuss epilepsy. Were they sharing positive stories, or were they sharing hardships and challenges? I also wanted to know whether people responded more to positive or negative tweets.

    While the potential is there, the technology may not be quite ready. These systems aren’t perfect, and context and the complexities of human expression can confuse even humans. While “I [expletive] love epilepsy” may seem to an immature algorithm to express a positive sentiment, the effectiveness of any system built on top of them is limited by these algorithms themselves.

    I considered this as I compared two sentiment analyzers. They gave me different answers for tweets that expressed a negative sentiment. Of course, which was “right” could be subjective, but most reasonable people would have agreed that the tone of the text was negative.

    Like a child, a system sometimes gets a wrong answer because it hasn’t learned enough to know the right one. This was likely the case in my example. The answer given was likely due to limitations in the algorithm. Still, imagine if I built my system to predict the mood of a patient using an immature algorithm. When the foundation is wrong, the house will crumble.

    But, also like a child, sometimes they give an answer because a parent taught them that answer. Whether through explicit coding choices or biased data sets, systems can “learn wrong”. After all, people created these systems—people, with their logic and ingenuity, but also their biases and flaws. A human told it that an answer was right or wrong. A human with a viewpoint. Or a human with an agenda.

    We create these systems with branches of code and then teach them which branch to follow. We let them learn and show enough proficiency, and then we trust them to keep getting better. We create new systems and give them more responsibility. But somewhere, back in the beginning, a fallible human wrote that first line of code. It is impossible for those actions not to influence every outcome.

    These systems will continue to be pervasive, reaching into new areas of our lives. We’ll continue to depend on and trust them because they make our lives easier. And because they get it right most of the time. The danger is assuming they always get it right and not questioning an answer the feels wrong. “The machine gave me the answer, so it must be true” is a dangerous statement, now more than ever.

    We dehumanize these programs once they encounter the cold metal box in which they run. However, they are extensions of our humanity, and it’s important to remember their human origins.