Personal opinion, grounded in public sources. No institutional affiliation. Data current through April 2026.
§1 · The billable hour comes apart
Take a mid-complexity transactional legal memo. A first-year associate drafts it in 6–8 hours; a partner reviews for under an hour. At associate $450/h and partner $1,200/h, the client pays $3,000–4,500[1]. The same memo, from an AI-native firm on flat-fee pricing, costs $1,000–2,000[2]. A 2–3× price delta.
Behind the prices, look at the cost. The traditional firm's fully-loaded internal cost per memo (associate + partner review) lands around $2,200–2,500[3]. The AI-native firm's task-level cost (associate + Harvey seat + token consumption) runs roughly $150–300 per memo, or $300–500 fully-loaded[4]. Even at half of Harvey's disclosed productivity upper bound, the cost gap is 4–8×. The billable-hour pricing logic no longer applies. Production cost has decoupled from labor time and is now driven by token consumption.
When the services dollar no longer splits across partners, associates, rent, and overhead, where does it land? My answer is a three-layer structure I call the Margin Sandwich: the floor (model and compute infrastructure) takes a structural cut; the ceiling is captured by a handful of AI-native firms that sell outcomes rather than time; the middle tier is compressed from both ends over the long run. Global PSF TAM is approximately $3.9 trillion[5]. That $3.9T is what will be redistributed.
V1Where the services dollar actually goes
Same mid-complexity memo, two business models. Both bars scaled to $1 so you can compare share directly.
V5The frontier doesn't stay cheap for long
Prices on a log scale. The commodity line just keeps falling. The frontier line doesn't — every new generation pushes it back up. The shaded gap between them is the premium frontier charges, and it hasn't closed.
§2 · The three layers of the sandwich
Floor · Model and compute infrastructure
The economics here resemble TSMC: high capex, falling unit cost, winners decided by compute scale and frontier model quality. The two leading model companies together generate ~$55B ARR, with combined growth of roughly 3× over the past 12 months; frontier token cost has fallen 10× in 30 months[6]. Every downstream AI inference call lets this layer take a structural cut.
The cut is durable because both supply and demand are locked. On the supply side, the top three (Anthropic 40% / OpenAI 27% / Google 21%) hold 88% of the enterprise LLM market (HHI ~2,770, Menlo Ventures 2025 baseline), backed by ~$433B of 2025 hyperscaler AI capex, with 2026 guidance pointing to $600B+[7]; OpenAI's inference gross margin has reached 70% (latest public disclosure, October 2025)[6]. On the demand side, even with multi-model routing, the workflow itself (agent orchestration, guardrails, data pipelines) stays locked to a specific ecosystem; switching providers means rebuilding the delivery stack. And demand-side AI adoption is competition-driven, not intrinsic: once clients see equivalent outcomes priced at AI-native levels, survival pressure forces the remaining players to shift to token-driven delivery.
Headlines about "collapsing token cost" describe last-generation models. Differentiating capability comes from frontier models, and model companies always release frontier models at a premium. For service firms that need the frontier to handle complex work, what the model layer offers is a price curve that keeps moving upward. The model layer is therefore a non-trivial structural cost for service firms.
Middle · Middleware
Two forms live here: AI-application wrappers built after 2023 (interface, workflow, domain fine-tuning on top of foundation models); and legacy tool platforms serving specific professional-service verticals. Taking finance as an illustration: M&A data-room providers (Datasite, Intralinks), financial data terminals (Capital IQ, Bloomberg, FactSet, PitchBook), and similar. Both forms occupy the same position in the AI stack: the "intermediate link." This position is being compressed from three directions.
Squeezed from above. Model companies keep folding wrapper differentiation into the base API: long context, retrieval, tool calling, agent orchestration, even domain fine-tuning. When API prices fall, wrappers must follow, but their R&D and sales costs do not compress in step. Squeezed from below (new contracts). AI-native end-to-end firms deliver the outcome directly. The client buys the result; separate data terminals and workflow tools are no longer procured alongside it. Squeezed from within (renewal contraction). Legacy-platform revenue scales with incumbent headcount and deal volume: Bloomberg / Capital IQ subscriptions track analyst counts; Datasite contract sizes track annual deal volume. As incumbents contract under AI-native pressure, seat- and deal-linked revenue contracts with them.
The time horizon differs. In practice, most wrappers get absorbed by the next model generation within a short cycle; legacy platforms have compliance and proprietary-data buffers, but over the long run the direction is the same. Only two paths survive long enough to matter: (a) regulator-mandated infrastructure (HSR antitrust filings, SEC disclosures, CFIUS review workflows); and (b) proprietary datasets built on decades of human editorial work that models cannot cheaply synthesize (Capital IQ's private-company financials, PitchBook's PE/VC deal library, and similar). US AI-related layoffs in 2025 tracked around 55,000 roles[8]; a significant share landed in this layer. The middle tier is the place investors should be most cautious about this cycle.
Ceiling · Full-stack AI-native firms
This will be the tier with the highest multiples. The root cause is the cost gap from §1: production cost is several times lower than traditional, so even at half the client price the margin is higher.
The cost advantage opens two pricing paths for AI-native firms. Path one: seat-based subscription. Harvey and Legora take this path. The product is tool-like, the client retains decision responsibility, and the revenue curve resembles traditional SaaS but scales faster. Path two: end-to-end outcome-based pricing. Crosby, EvenUp, and Sierra take this path. Lower cost alone is not enough; the client has to be willing to place delivery responsibility on the AI-native firm's output. Once that trust is cleared, the firm can price on value captured at the moment of settlement, closing, or sign-off (Sierra hit $100M ARR in seven quarters and crossed $150M by February 2026)[9].
Across both paths, AI-native firms are scaling faster than the traditional SaaS curve across verticals: legal, insurance claims, customer-service agents. AI-native structural growth does not depend on any single pricing model; it is driven by a re-architecture of cost structure and delivery form. The same pattern surfaces independently outside PSF — in healthcare, Abridge ($5.3B) and OpenEvidence ($12B) show the paradigm extends beyond professional services.
Cost advantage compounds over time: case data improves the model and agent system; successful deployments become reference architectures for the next sale; regulatory audit history becomes trust capital for enterprise buyers. Three flywheels together make the strong stronger. This tier's 55–67× forward-ARR multiples partly reflect pre-pricing of those flywheels (market liquidity and scarcity premia are also at work).
Capital is voting with its feet. In the past 18 months, at least 4 PSF-focused dedicated funds have deployed billions (GC Fund XII, Eudia / Johnson Hana, Accrual, Crete Professionals Alliance); OpenAI took a direct equity stake in a CPA roll-up platform[10]. Model companies are no longer only suppliers; they are becoming equity holders in the replacements. A new choice has also opened on the buyer side that didn't exist in 2022: in professional services, buyers can build AI capability in-house, or procure delivery directly from AI-native firms. The TAM for this build-or-buy decision is the same $3.9 trillion PSF stock referenced in §1, viewed from the demand side. Foundation Capital frames this shift as "Services-as-Software"[12].
V2The three-layer margin sandwich
The floor takes a fixed cut on every call. The middle gets squeezed from both sides. The ceiling captures the premium from selling outcomes. Tile size tracks valuation; hover any tile for ARR, multiple, and how they charge.
V4Which slices of professional services rip first
8 verticals across 5 regions. Darker cells are easier for AI-native firms to disrupt. The inner square grows with the local market size.
§3 · Core judgment: AI-native beats AI-equipped
A non-consensus call: over the medium-to-long term, AI-native firms built from zero will outperform AI-equipped firms retrofitting AI onto legacy architecture. Frontier models are equally available to both sides; they are not the decisive factor in competition. Three structural layers outside the model decide who wins.
(1) Capital structure
AI-native firms typically use common-equity architecture: a large fraction of profit can be retained and reinvested, decisions are fast, and equity can be distributed to non-founder employees. Partnership structures run on annual distributions and consensus-driven capital allocation: highly efficient in steady state, but structurally slow when a material fraction of annual profit needs to be pulled forward into a new business whose return horizon spans several distribution cycles. Limited mechanisms for extending equity below partner level also cap the firm's ability to attract top-tier AI talent.
(2) Legacy constraint inheritance
Both sides face bar rules, SEC, IFRS, and GDPR. The AI-equipped firm's disadvantage is this: it operates inside a framework full of historical constraints, such as cross-client independence restrictions, surviving clauses from past engagement letters, conflict-check restrictions from historical clients, and limitations on how historical client data can be used. These constraints decide case by case which verticals an AI-equipped firm can enter, and whether it can train on its own decade of accumulated data. The AI-native firm carries none of these historical burdens.
(3) Unit-economics co-design
An AI-native firm's four pillars (cost structure, pricing model, delivery workflow, talent profile) are all designed around the agentic workflow. An AI-equipped firm is adding AI to a system that has been built for labor-time for decades: hourly billing, seniority-based pricing, and time-output performance evaluation are all still in place, and AI just raises per-hour efficiency. Short-term productivity improves; long-term unit economics does not converge.
From an investment standpoint: (a) For professional-services investing, be cautious of deals that bolt AI onto an existing partnership structure. (b) For AI transformation inside an incumbent firm, the real bottleneck is firm architecture, not technology; buying AI tools alone cannot deliver the unit-economics improvement. (c) For building a new AI-native firm, equity structure, vertical selection, and agentic workflow must all be designed into the architecture from day one.
Sources
- [1]Traditional client-facing price per mid-complexity memo. Wolters Kluwer ELM Solutions Real Rate Report 2025; Thomson Reuters State of the Legal Market 2025: associate $450/h (AmLaw 200 midpoint), partner $1,200/h (Tier-1 equity-partner median). Staffing 6–8 associate hours + 0.5–1 partner review hour from AmLaw 2024. Waterfall: midpoint $450×7h + $1,200×0.75h = $4,050; rack-rate range $3,300–$4,800; after ~7–10% annual discount / year-end write-down, the effective billed range converges to $3,000–$4,500. ↩
- [2]Agent-native client-facing flat-fee pricing: Crosby disclosed range for comparable transactional work. TechCrunch, "Crosby raises $30M for AI-native law firm," 2025; The Information 2025 coverage of Crosby flat-fee matter types. The precise $1,000–2,000 range is the author's analogy estimate. ↩
- [3]Traditional internal cost per memo (elite BigLaw scope). First-year associate base $225K (NALP 2025 top-tier; market-wide median $200K) × fully-loaded multiplier 1.8–2.2× = $405–495K fully loaded. At 1,800 billable hours and ~7h per memo → ~260 memos/year; associate allocation $1,560–1,905 per memo. Senior partner fully loaded ~$1.5M / 1,800 billable hours × 0.75h review → ~$625 per memo. Total $2,185–$2,530, rounded to $2,200–$2,500. ↩
- [4]Agent-native internal cost per memo, task-level processing cost scope (associate + Harvey seat + token; excl. partner review). Associate loaded $400–500K; Harvey seat $14,400/year (Forbes 2024 estimate; Harvey does not publish pricing). Productivity multiplier 5–10× research-subtask upper bound → $150–300 per memo; full workflow 3–7× → $230–530. Adding senior-review allocation (Crosby-type firms still run 0.1–0.3h review, ~$80–180) plus brand / acquisition overhead yields a fully-loaded figure of ~$300–500 per memo — comparable with [^3]'s $2,200–2,500 fully-loaded, a 4–8× gap. ↩
- [5]Global PSF TAM and vertical / regional split. Legal $0.85–1.03T, Accounting $675B, Consulting $359B, Tax $290B, Audit ~$260B, M&A $52B, IT services $1.42T, BPO $350B (Statista, IBISWorld, Gartner, IFAC, LSEG, Grand View 2024); de-duplicated total ~$3.9T (author's consolidation). Regional split per Statista and Kennedy Consulting Research 2024. ↩
- [6]OpenAI ARR: $200M (2022) → $1.6B (2023) → $5.5B (2024) → $20B (October 2025) → ~$25B (April 2026, Sacra/i10x aggregation); inference gross margin ~70% per latest public disclosure (October 2025); Bloomberg 2025-12-21 reference; compute-only margin, excludes training amortization (consolidated margin after training ~33%). Anthropic ARR: $10M (2022) → $100M (2023) → $1B (2024) → $9B (year-end 2025) → ~$30B (April 2026, SaaStr / Anthropic / Yahoo Finance); Claude Code alone $2.5B ARR by February 2026. Combined top-two ARR ~$55B. Frontier token cost: GPT-4 (March 2023) $36 blended → GPT-5 (August 2025) $3 blended = 12× over 29 months (the "10×" in the body is a conservative round number). Since 2025, frontier has re-anchored upward with each new generation (o1-preview, Claude Opus 4, o3-pro, Opus 4.5/4.7 all at premium pricing); current April 2026 frontier blended ~$9. Sources: OpenAI / Anthropic / Google pricing pages 2023–2026; Epoch AI, "Trends in LLM Cost per Token," 2025 (capability-normalized decline ~40×/year). ↩
- [7]Enterprise LLM market concentration: top-three combined share 88% (Menlo Ventures, 2025: The State of Generative AI in the Enterprise; no 2026 edition yet). HHI ~2,770 from published shares. 2025 hyperscaler AI capex actual ~$433B: Microsoft FY25 $88.2B (incl. finance leases), Alphabet 2025 $91.4B, Amazon 2025 $131.8B, Meta 2025 $72.22B, Oracle FY26 ~$50B (each company's 2025 Q4 / FY25 earnings releases). 2026 guidance aggregates to $600B+: Amazon ~$200B, Alphabet $175–185B, Meta $115–135B, Oracle ~$50B, Microsoft external estimates $120–146B. ↩
- [8]Middleware clearing and AI-related layoffs. Jasper revenue $120M (2023) → $55M (2024, −54%); valuation cut to ~$1.2B (2024). Robin AI split fire-sale: after failing to raise $50M in October 2025, the managed-services arm was sold to Scissero in December 2025, and the engineering team was acqui-hired by Microsoft in January 2026 for Word-for-lawyers (Artificial Lawyer 2026-01-09; Legal IT Insider 2026-01-12). US AI-cited layoffs in 2025: 54,836 roles (Challenger, Gray & Christmas, 2025 Year-End Report), ~5% of US total layoffs of 1,206,374. 2026 Q1 AI-cited layoffs already reached 27,645 (March alone: 15,341), indicating an accelerating pace. ↩
- [9]AI-native firm valuations and ARR trajectories (as of April 2026). Harvey $11B / $190M ARR (February 2026, Forbes) → >$200M ARR (March 2026, Business Insider); Legora $5.55B / >$100M ARR (Reuters 2026-03-10; Legora newsroom 2026-04; acquired Walter in 2026 Q1); Sierra $10B,$100M ARR in seven quarters, crossed $150M by 2026-02 (Sierra official Year Two in Review 2026-02-06); Hebbia $700M / $13M profitable (Forbes 2024-06; older data — apply caveat); Basis $1.15B, 30% of US Top-25 accounting firms (Bloomberg 2026-02-24); EvenUp $2B / ~$110M ARR (The Information 2025-10); Crosby $400M / Series B $60M (Forbes 2026-03-31). ↩
- [10]Capital flow to AI-native professional-services consolidation. Past 18 months: at least 4 PSF-focused funds totaling billions — GC Fund XII (2024-08), Eudia / Johnson Hana (2025-10), Accrual (2025), Crete Professionals Alliance (Thrive Capital + OpenAI Startup Fund, 2024–2025). OpenAI took an equity stake in Thrive Holdings December 2025 (The Information). Note: Long Lake (Khosla 2025, HOA / property-management) is an AI-enabled services roll-up but not PSF, and is not included in this basket. ↩
- [12]Services-as-Software concept: Joanne Chen, "AI leads a service as software paradigm shift," Foundation Capital, April 2024; retrospective "The $4.6T Services-as-Software opportunity: Lessons from the first year," Foundation Capital, April 2025. Foundation Capital's $4.6T is an investment-framework TAM on a different scope from this essay's $3.9T ($3.9T is Statista / IBISWorld aggregation, de-duplicated). This essay anchors on $3.9T throughout and references Foundation Capital here only for the conceptual framing. ↩