In This Article
Summary: Most thought leadership on AI in marketing is written by people who don’t actually run a marketing team. This isn’t that. We run upGrowth Digital, a 32-person agency serving 150+ clients across SaaS, fintech, D2C, and healthcare. Over the last 18 months we rebuilt our internal workflow three times as AI tools matured. This article walks through what actually changed operationally, which metrics moved, one experiment we abandoned, and what clients now ask us that they didn’t ask two years ago. If you run an agency, lead in-house marketing, or evaluate agencies as a CMO, this is the uncensored version.
When Claude Code launched in 2024 and Claude Design shipped this April, most of the public commentary came from people outside the day-to-day of running a marketing team. The takes sound confident and usually miss what actually happens when you try to bolt AI onto a live services business with payroll to meet and clients expecting deliverables on Monday. At upGrowth Digital, we’ve been in the middle of that rewiring for 18 months. This is what actually happened, with the numbers we can share.
Context first. We’re a 32-person agency. We serve over 150 active clients. Our verticals are SaaS, fintech, D2C, healthcare, and edtech, with a significant presence in Dubai and the broader GCC. Our largest case studies are Lendingkart (5.7x lead volume, 30 percent CPL reduction, 4x spend scaling), Delicut in Dubai (20K AED per month to 2M AED per month), and a stack of mid-market fintech and SaaS brands most people haven’t heard of because we don’t publish every win. We started the AI rebuild in late 2024 and we’re still rebuilding.
What follows is what a competitor agency would want to know and what a CMO evaluating us would want to hear before signing a contract. I’ll cover what changed in the work, what changed in the team, what changed in the pricing conversation, one thing we tried and abandoned, and what’s still hard. If you want the sanitized case study version, this isn’t it.
Three things rewired the production side of the agency. None of them were the obvious “let’s use ChatGPT for content” move that most agencies started with and then stopped at.
The first was the first-draft layer. Starting in late 2024, every piece of content, every ad variant, every reporting narrative passes through an AI-first draft before a human touches it. This sounds trivial. It’s not. Getting the prompts right, the reference material curated, the brand voice training dialed in, and the QA checklist disciplined took about six months of iteration. The first version produced generic content that our senior editors rewrote from scratch, which was slower than pure human work. The current version produces drafts that land at roughly 70 percent of the final quality, and human editors spend their time on the 30 percent that matters (the angle, the nuance, the client-specific insights). Turnaround on a long-form blog article dropped from 8 working days to 3.
The second was the measurement and reporting layer. We built internal agents that pull data from GA4, Google Search Console, Meta, LinkedIn, and client CRMs and produce first-pass performance narratives. The first version was wrong more often than it was right, which taught us that data quality is the binding constraint, not AI capability. We spent four months cleaning data pipelines before the AI layer produced anything useful. Once the pipelines were clean, a weekly client report that used to take an account manager 3 hours to prepare now takes 25 minutes. The manager’s time shifted from data pulling to insight generation.
The third was the client-facing research layer. When a prospect fills out our contact form, our system now runs a structured research pipeline before anyone on our team sees the inquiry. Company research, founder background, recent news, likely pain points, and a preliminary fit score are all compiled in under a minute. The lead response email that reaches the prospect is written with this context baked in, which changes the response rate significantly. We moved from a model where a senior person spent 15 minutes per lead deciding whether to respond to one where that same person spends 2 minutes reviewing an already-drafted response. Our inbound close rate approximately doubled over 12 months, though I can’t fully attribute the lift because we changed other variables at the same time.
Also Read: Agency Fit Score: Which Marketing Agency Model Fits Your Company?
We didn’t reduce headcount. We’re still 32 people. What changed is the ratio of what they do.
Two years ago, roughly 70 percent of our team’s time went to production work (writing, designing, building reports, running campaigns) and 30 percent went to what we call “non-production” work (strategy, client calls, internal tooling, relationship work, hiring). Today that ratio is closer to 40 percent production and 60 percent non-production. The production work is still happening, but AI does a larger share of the keystrokes and our team spends more time on judgment, client conversations, and building internal systems that compound.
We also restructured who does what. We used to have a clear hierarchy: junior writers, senior writers, content leads, account managers, strategy leads. That hierarchy still exists on paper, but the work distribution is different. Junior team members now act more like editors and QA operators on AI output than as first-draft producers. Senior team members spend more time on client strategy and less on reviewing junior work. The middle layer (content leads, performance leads) got more strategic because the operational load on them dropped.
Hiring is the hardest part of this transition and I’ll be honest about what we got wrong. We tried hiring “AI-fluent” mid-level marketers for about six months in 2025 and most of them underperformed. The signal we were screening for (comfort with prompting, familiarity with tools) turned out to be a weak predictor of actual performance. What mattered more was taste, judgment, and the ability to critique AI output rigorously. We shifted our hiring to prioritize those traits and deprioritize “AI tool fluency” because the latter is now easy to teach and the former isn’t.
The people who thrive on our current team are ones who are opinionated about quality, willing to disagree with AI output rather than just accept it, and genuinely interested in client business outcomes rather than in craft for its own sake. That profile hasn’t changed much from the pre-AI era, interestingly. Good marketers are still good marketers. The tooling around them changed; the underlying capability didn’t.
Our average retainer size went up, not down, over the last 18 months. That surprises most people.
The reason is that we restructured what the retainer buys. Two years ago a typical mid-market fintech retainer with us was around 4 lakh per month and covered a defined scope of deliverables: a certain number of articles, a certain amount of paid media management, a certain reporting cadence. The client bought outputs. Today a comparable engagement runs closer to 5.5 to 7 lakh per month and the scope is framed around a growth motion rather than outputs. “Scale organic demand capture for our enterprise segment” or “drive qualified SQL volume from LinkedIn” is the scope. The outputs underneath are variable and we optimize them.
The retainer increased because the work underneath got more valuable. Strategy, relationships, measurement, and accountability are all priced higher than content production used to be. We can serve more clients per senior person because AI handles execution leverage, but we charge more per engagement because the work is upstream of production. The clients who’ve made this switch with us (most of our existing roster) haven’t pushed back because they’re getting outcomes they weren’t getting at the old scope. The ones who want to buy outputs at the old rates either went elsewhere or we transitioned them out. That’s fine. Not every client is the right fit for the new model.
We also added performance-tied compensation on about 40 percent of our engagements. Typically 10 to 20 percent of the fee is tied to an outcome metric (ROAS, CPL, organic revenue, qualified pipeline) with a clear definition and a cap. This created internal discipline that flat retainers didn’t. When a client asks us to do work we think won’t move the outcome, we push back harder because our margin is on the line. That friction produces better decisions on both sides.
Also Read: Why Performance-Tied Agency Compensation Is Becoming the Default in 2026
For about four months in 2025, we tried to build a fully autonomous lead response agent that would handle inbound inquiries end-to-end without a human in the loop. The pitch to ourselves was that we could scale inbound handling to infinity with no marginal cost per lead. We built it, deployed it, and watched it fail in ways that taught us more than the successes did.
The first failure was tone. Even with extensive brand voice training, the fully-autonomous responses had a sameness that prospects picked up on. We started getting feedback like “felt automated” and “didn’t seem to really read my brief.” The responses were technically correct. They just didn’t land the way a human response would.
The second failure was context. The agent made accurate surface-level observations about the prospect’s company but missed the judgment calls that matter in a sales conversation. When to push back on a brief. When to offer something unexpected. When to say no and refer them elsewhere. These are judgment calls AI didn’t handle well in that specific context, and the result was a response pipeline that converted at about 30 percent lower rate than our human-in-the-loop version.
We pulled it after four months and rebuilt the current version, which uses AI for research and drafting but keeps a senior person in the loop for review and personalization. That hybrid model converts better than either pure-human or pure-AI did. The lesson we took from this was specific: AI is great at augmentation and bad at autonomy when the work requires judgment under incomplete information. We now default to augmentation everywhere and only consider autonomy for tasks with clear right answers.
The nature of client conversations shifted in predictable ways as the market caught up to what AI can do.
The first new question we get is “how are you using AI in our workflow.” Two years ago this question was theoretical and clients trusted us to figure it out. Now it’s specific and they want concrete examples. We walk them through the first-draft layer, the measurement layer, the research layer, and we name the tools. Transparency here matters more than it used to, because clients are calibrating their spend based on how much AI leverage they’re paying for. Vague answers cost us deals.
The second new question is “why are you charging more than last year if AI is making this cheaper.” This one required us to reframe how we explain the work. The answer is that the production cost dropped but the strategy and judgment cost didn’t, and we’ve restructured what the fee buys. Clients who understand the distinction stay. The ones who expect a pricing cut because “AI should be cheaper” usually don’t, and we don’t try hard to keep them. The economics don’t work.
The third new question is “can you guarantee revenue outcomes.” This one we still decline. We can guarantee what we control (ad spend, uptime, reporting quality, response time) and we’re willing to tie fees to outcomes we can influence (CPL, ROAS, qualified leads). We don’t guarantee revenue because too much of it sits with the client’s sales team, product, pricing, and timing. Agencies that guarantee revenue outcomes in 2026 are either overpromising or running a different business model than we do. We’d rather lose the occasional deal than commit to something we can’t deliver.
The fourth new question, and this one is interesting, is “who on your team is actually making the decisions.” Clients are increasingly aware that the human judgment layer is where the value sits, and they want to know the specific people whose thinking they’re renting. This has made senior team visibility more important in the sales process. The partner or strategy lead who shows up on the first call needs to be the same person who’ll be present through the engagement. Bait-and-switch staffing (senior person sells, junior person delivers) doesn’t work anymore because clients can tell the difference.
I want to be honest about what hasn’t been solved yet, because the triumphant “we rebuilt the agency with AI” narrative has a lot of survivorship bias in it.
Onboarding new clients is still hard. Getting a new client’s data, brand voice, positioning, and operational context loaded into our AI pipelines takes between four and eight weeks, and during that window the AI leverage doesn’t work well. We’ve shortened it from six months originally, but onboarding drag is real and we haven’t found a clean solution. The first month with a new client is still mostly human-intensive.
Data quality is a persistent issue. About 30 percent of the time we onboard a client, their analytics are misconfigured in ways that make AI-layered measurement less reliable. Cleaning this takes time that clients don’t want to pay for but the measurement work depends on. We’ve started including a paid data audit as a prerequisite for most engagements, which some prospects reject. That’s a trade we’ve made peace with.
Model and tool changes break things constantly. When a major AI tool updates its API or shifts its quality, some percentage of our internal workflows break or degrade. We’ve built a small internal QA team whose job is to monitor for these shifts and update the affected workflows. This is ongoing cost that doesn’t show up in the marketing narrative about AI efficiency.
Hiring senior talent is harder than it was. The pool of people who have both strategic judgment and comfort with AI-augmented workflows is small and getting competed over by in-house teams and well-funded startups. We’re paying more for senior hires than we were, which shows up in our margin. The alternative is not hiring, which shows up in our capacity. We chose the higher cost.
Also Read: How upGrowth Thinks About AI-Augmented Marketing Strategy for Mid-Market Brands
From conversations with other agency founders and operators, there are three common patterns I’d call out.
The first is layering AI on top of unchanged workflows. An agency buys ChatGPT licenses for the team, runs a few prompt engineering workshops, and expects efficiency gains. The gains don’t materialize because the underlying workflow wasn’t designed around AI. This produces the worst of both worlds: you pay for the tools and your team is slower than before because they’re context-switching between old processes and new. The only way this works is a from-scratch workflow rebuild, which is disruptive and expensive, and most agencies aren’t willing to do it.
The second is pricing the output lower instead of pricing the work differently. Agencies in this bucket reduce their rates because they think AI makes the work cheaper, so they cut prices 20 percent to stay competitive. This doesn’t work because it misdiagnoses the value. The production cost dropped but the strategy cost didn’t. Price cuts commoditize the agency and compress margin to unsustainable levels. The move is to restructure what the fee buys, not to cut the fee for the same thing.
The third is ignoring the measurement and accountability layer entirely. Most agencies we see pitching AI integration talk about content production and maybe creative production. Very few talk about how AI has changed the measurement and reporting cadence, or how they’re using AI to tie work to outcomes. This is where the most durable value lives and where most of the industry is still behind. The agency that gets measurement and accountability right has a moat the content production agencies don’t.
Q: How long did the workflow rebuild take?
A: About 18 months from start to current state, and we’re still rebuilding. The first three months produced no visible gains because we were redesigning from scratch. Months four through twelve produced inconsistent gains as we iterated. Months thirteen through eighteen is when the compounding started. Most agencies stop at month three because the gains haven’t shown up yet, which is the exact wrong time to stop.
Q: What’s your margin now versus two years ago?
A: Margin went up modestly, roughly 5 to 8 percentage points, despite higher senior compensation. The lift came from serving more clients per senior person (leverage on the senior layer) and from performance-tied compensation kicking in on well-executed engagements. The increase wasn’t from reducing headcount or cutting costs aggressively.
Q: What AI tools do you actually use day-to-day?
A: The stack changes quarterly. Current anchors are Claude for most drafting and reasoning, custom Apps Script pipelines for lead handling and reporting, internal skill libraries that orchestrate Claude for specific workflows, and standard measurement tools like GA4, GSC, and Meta that we’ve wrapped with AI-layered reporting. Tool selection is less important than workflow design. An agency using best-in-class tools with a bad workflow will lose to an agency using adequate tools with a great workflow.
Q: Would you recommend in-house marketing teams try to replicate this?
A: For companies above about 30M USD in revenue with a real marketing team, yes, partially. The first-draft layer and the reporting layer are feasible internally with 6 to 12 months of focused effort. The research layer is harder because it requires external data pipelines. For companies below that size, the ROI on building internal AI infrastructure usually doesn’t work and an AI-native operator agency is the better path. The math changes at scale.
Q: What’s the single biggest mistake other agencies make when trying to rebuild around AI?
A: Treating it as a tool problem rather than a workflow problem. Most agencies buy AI tools and keep their workflow, then are disappointed when gains are modest. The tools are commoditized. The workflow is the moat. Rebuilding the workflow is disruptive, takes 12 to 18 months, requires pissing off some clients and team members along the way, and produces outsized gains if you stay the course. Most don’t stay the course.
If you run an agency, the question is whether to start the workflow rebuild now or keep pretending AI is just a tool to add to your stack. The rebuild is hard and the payoff is 12 to 18 months out. Not rebuilding means slowly losing competitive position to agencies that did. There’s no middle path that actually works, as far as I’ve seen.
If you’re a CMO or founder evaluating agencies, the question is whether your current agency has done this rebuild or is still layering AI on top of unchanged workflows. The diagnostic questions in the “Three Signs Your Agency Contract Is Obsolete” section of our previous article are a useful filter. The Agency Fit Score diagnostic is a faster one.
If you want to have a specific conversation about what rebuilding a marketing function around AI looks like for your company, we run a 30-minute operator conversation for qualifying mid-market companies. No deck, no pitch, just a working session on where your marketing function sits and where it should go. Book your operator conversation here.
About the Author: I’m Amol Ghemud, Chief Growth Officer at upGrowth Digital. We help SaaS, fintech, and D2C companies shift from traditional SEO to Generative Engine Optimization. This shift has generated 5.7x lead volume increases for clients like Lendingkart and 287% revenue growth for Vance.
In This Article