When reasoning becomes a scarce resource, who captures the value

2026/06/09 04:19
🌐en
When reasoning becomes a scarce resource, who captures the value

Author:Frank FuIOSG

 

In 2023, David Cahn suggested that the hole was never filled on the training side. It has been included in the reasoning, and the market has only begun to factor it into pricing in the past few weeks. When Yin Weidar regrouped his financials around "service token,"CerebrasIt's 20 times over-required, the bottleneck fight is over, and the real problem is next: When reasoning becomes a scarce resource, the value is deposited at what level of the calculator。

I. FOLLOW GPU: FROM $200 BILLION TO $60 BILLION

In 2023, David Cahn of Sequoia raised the issue that hangs over the entire AI building, the "billion dollar problem". Each dollar spent on GPU and approximately another dollar spent on powering it in the data centre, so each year the GPU CapEx means that the chips ultimately have to generate about $200 billion in revenue to recover the capital. Even with a very generous assumption of AI's income, he found a hole of over $12.5 billion between “input” and “end-user payments”. The concern is clear: GPU is being over-built ahead of real demand。

A year later, the gap has widened instead of being narrow. In the continuation of 2024, Cahn redefined it as a "$60 billion problem" with the expansion of the super-large manufacturer CapEx. A familiar shape of empty logic: excessive construction leads to oversupply, which burns capital。

both articles actually ask the same thing: who's going to fill this hole? the answer never appears in the books on the "train" side. it appears on the side of inference, and the market has only begun to factor it into pricing in the past few weeks。

ii. Cerebras IPO and reasoning squeeze

Cerebras was listed on Thursday. This time, IPO received 20 times the excess requisition, and the price was nearly double the final increase on Wednesday. Demand does not come from a bet on the next Nvidia killer, but from a simpler thing: the market is beginning to realize that the real bottleneck in AI is inference, not training。

Cerebras' talent is a chip structure that makes reasoning very quick. Not training, reasoning. That's what turned Wall Street on. The market is recurrent, it expands with usage. Every time Claude answers a question, every time antent undertakes a mission, it's draining computing. Training only happens once, and reasoning never stops。

J.P. Morgan estimated the size of the reference market as 10 to 50 times the size of the training. When the machine starts to carry out its mission from other machines, which is to expand in the form of anatomy, the reference demand no longer expands with the number of users, but with the calculus itself。

III. Nvidia Redraw the map: the reasoning becomes the headline

If Cerebras is the awakening of the market, then Nvidia's latest season is a confirmation from the top of the chain. At the latest financial conference, Jensen Huaang made that unsaid statement clear: AI demand is growing in a parabolic fashion. The reason is simple:it's not like I'm going to do thisAlready coming. Mainstream AI has moved from a one-time reasoning to a logical reasoning, and then to a process in which it will call its own tools, organize tasks. Huang said, "Tokens are now profitable. In the AI era, the ability to calculate is income and profit。

it's reshaping the whole industry. training is the one-time cost of building a model, and reference is the recurrent cost of running it, while the bottlenecks today are deduced and not trained。

Nvidia wrote this judgment into her financial statements. It is now disclosed on two platforms rather than one: Data Center and Edge Marketing. Data centres (approximately $75 billion for the current season, compared to +92 per cent) were further dismantled to Hyperscale (approximately $38 billion, relative to +12 per cent) and ACIE, i.e. AI clouds, industry and enterprise (approximately $37 billion, relative to +31 per cent). A completely new line is Edge Compushing: $6.4 billion, +29 per cent, covering truly run terminals such as PC, workstation, AI-RAN base station, robots and cars。

The edges still account for less than 8 per cent of total income, but Nvidia has elevated it to the second platform alongside the data centre. The signal is that inference is divided into two fronts, the data centre's claud inference, and the edged endpoint inference, and AI is to see, move and act in the physical world. The road map follows the same logic: from the third quarterVera RubinIn addition, Huang gave a brand-new $200 billion TAM for Vera CPU, which is designed for anagentic load. Each front-line model company is expected to fully shift to it on the first day。

when the highest-value companies on the planet reorganized their financial disclosure around "service token", the bottlenecks settled. the rest of this paper deals with who captures value when reference (rather than training) becomes a scarce resource。

A scoping note first. On both fronts, this paper deals with the CDU, a rented data centre that provides external API token services. Endpoint refers to a local chip within the device itself (Jetson, RTX, Drive, AI-RAN) that runs completely without the GPS lease and polymer. Here, please consider it as a smooth way to magnify the entire economy, to support the bottlenecks, not toHyperbolicAnd Vince is in the market, and they're completely on the cloud line。

Four. Squeeze has arrived

Anthropic is a canary in a coal mine. Using much more than pre-configured capacity, Claude’s complaints about being “brain-leafed” are all over the web, including restricted flow responses, slow reasoning, compressed context windows. Solving is a naked calculation: in May 2026, Anthropic took over from SpaceX the entire Colossus 1 data centre, 220,000 plus Nvidia GPU, 300+ MW, and dedicated it to inference, not training。

This part of the capacity unlocks a series of cap changes, each of which is a signal. On May 6, Anthropic doubled the five-hour limit for Claude Code, removed the peak time limit and significantly increased the API limit for Opus. On 13 May, the weekly limit for Claude Code was raised by another 50 per cent (until 13 July). Subsequently, as of 15 June, it did the opposite of "sponsible": to use Agent SDK, no-head mode claude-p, CI streaming water lines from flat subscriptions and place them in an independently measured credit pool ($20-200 per month at API prices). This last step condensed the whole set of arguments into an action: ant consumes inference at a much faster rate than the design tolerance of flat subscriptions, so it has to be priced according to its original "recurring costs"。

training is a one-time capital expenditure. inference is a recurring operating cost that accrues with each new user and every new delegate。

V. The warehouse: six floors, one bottleneck

EACH AI APPLICATION IS LOCATED IN A SUPPLY CHAIN STARTING WITH THE TSMC CRYSTAL CIRCLE PLANT AND ENDING WITH THE API ENDPOINT:

Most companies have only one of them. Nvidia owns silicon, CoreWeave owns nudity metal, Together AI has reasoning optimization, OpenRouter has model API route。

Except for one。

VI. Hyperbolic: The only three-tiered company

Hyperbolic launched its GPU market in June 2025. In the first few months, the number of developers has surpassed 200,000+, using a front-line AI laboratory, search, and large consumer-level platforms。

What's interesting is its structure。

Hyperbolic owns a GPU not held. Each card comes from the Neocloud and data centres, including CoreWeave, Lambda Labs, Nebius and smaller operators with idle capacity. It sounds like a weakness, but it's the moat。

By sitting between GPU suppliers and consumers, Hyperbolic can see real-time data that no one else can see. It knows who buys what GPU at what price, at what time. It saw it before the excess supply became open, before demand surged into markets。

Today, the moat itself is a multi-claud aggregation. Hyperbolic sutures the debris production capacity from dozens of stand-alone cloud and data centres into a standardized unified pool that allows developers to rent the cheapest available GPU anywhere without negotiating with each operator or managing a pile of accounts. The more clouds it reaches, the more fluid it is, the more priced data it is. In the future, the team is exploring how to use these data to model the GPU price curve and eventually invest in its own capital to smooth supply and demand and to play the role of a marketer in physical calculations; however, this goal is still at an early stage, and it is the polymer layer that is really recovering now。

This is the wheel:

  1. Access to more clouds, more aggregate supplies

  2. More supply, deeper markets and real-time pricing data

  3. Better data, now smarter paths, longer pricing models

  4. Better mobility and prices, more developers, more clouds to access

No other company is trying this. Hyperbolic is the only company that runs across both the GPU lease, deployment and model API layers。

VIIVinceThis mirror

Venice is the clearest expression of the economy at the application level and a useful contrast to the location of Hyperbolic. It is a privacy-priority reasoning application: an OpenAI compatible set of APIs, together with consumer-oriented subscriptions (Free / Pro / Pro+ / Max), leads requests to approximately 75 models, of which about two thirds are open-source or self-hosted models (Llama, Mistral, Qwen, DeepSeek), and the rest are anonymous faxes of closed-source frontier models. The point is, Vince doesn't have a meaningful calculus. Its ever-publicized GPU collaborators and confidential computing suppliers (NEAR AI Cloud, Phala) rented it and paid for it through the front lab, so its real cost of access is inference, not SaaS hosting。

Vince really sells privacy. The term "privileged" is not about turning public computing into private property, but rather about adding a layer of assurance to the logic of commercialization: not to keep data, not to train, to ask for anonymity, and partly to run into TEE, so that the operator does not see it. The bottom factor is the road load, and the price added is this layer of privacy packaging. And this layer of assurance is layered and uneven: an open source model that runs under its own control or on TEE GPU can be calculated from the end to the end of the spectrum; but an anonymous pass-through of a closed source model like Claude, GPT, privacy only strips your identity, and your original prompt is still being processed at the end of the front lab. So the strongest privacy covers only the open source, and the front-line model is "anonymous" rather than "real secret." Venice's Maori = subscription price – the portion of the price paid downstream for inference, which is more than the price of naked API, is almost entirely supported by the price of the pricervacy, which is why it is thin and subject to forward-through pricing。

The token design packed this part of the request. Venice runs on two tokens: VVVV (collateral and platform access) and DIEM, which is a reference credit, each DIEM is approximately $1 a day. The fee subscription triggers the programmable buy-back of VVVs (Pro / Pro+ / Max, approximately US$ 2/5 / 10, respectively), while emissions decrease according to a fixed schedule: 6M → 5M → 4M VVVV per month, down to 3M on 1 July. Repurchases are real, but are discretionary and still modest: in April and May, about US$ 10.3 million were destroyed, and in June, it was slowly climbing to about US$ 110 million, well below the $200,000 line per month。

The basic surface is healthier than the title. The publicly circulated figure of US$ 70 million ARR can almost certainly be identified as a net addition to the subscription fee; the defensible area is closer to US$ 6 million to US$ 15 million. In this context, action is real: about 136,000 currency-held addresses, about 9.9 million website visits per month (about 330,000 times a day), and new Pro subscriptions hover around the line about 1400 a day. This is a real business, but the economy of a thin business is subject to the calculation it has acquired。

That's why Hyperbolic is on the top of it. If Vince is a gas station, Hyperbolic is a refinery. Venice has purchasing power from the same restricted supply on which all people depend; Hyperbolic supplies that segment to aggregate, standardize and sell to Venice and all players like it. As demand grows, value accumulates not only to consumption-calculation applications, but also to aggregate and route-calculation and capture the accumulation of cost of consumption payments for these applications。

Why is this important

Nvidia has restructured its finances around "service token". Cerebras' IPO proves that the market has understood that access is a bottleneck. Anthropic runs around for capacity, proving it's a real problem. Agentic and physical AI will magnify the demand by several orders of magnitude across the cloud and the side of the end。

And it has the "60 billion dollar problem" ring from the other side. Cahn ' s empty logic, i.e., over-building, then overcapacity, is likely to be validated eventually. But excess is precisely the best thing for a light asset-polymer: when GPU prices go down and the fragmentation of supplies is scattered over dozens of clouds, the player who does not have any hardware and who transfers each task load to the cheapest available card earns the difference, while the operator who holds the depreciation GPU bears the loss. Hyperbolic is doing more than it is empty。

THE COMPANY THAT EVENTUALLY WON WAS NOT THE ONE WITH THE LARGEST GPU, BUT THE ONE THAT COULD TELL YOU WHICH GPUS WERE WHERE AND AT WHAT PRICE THEY WERE AVAILABLE, AND WHICH ONE OF THE LOADS OF EACH JOB WOULD RUN AT THE LOWEST COST。

Hyperbolic is building this company. They don't own GPU, they're software, they're deep in three layers, but they're made into the ultimate calculus of inference。

บทความที่เกี่ยวข้อง

QQlink

ไม่มีแบ็คดอร์เข้ารหัสลับ ไม่มีการประนีประนอม แพลตฟอร์มโซเชียลและการเงินแบบกระจายอำนาจที่ใช้เทคโนโลยีบล็อกเชน คืนความเป็นส่วนตัวและเสรีภาพให้กับผู้ใช้

© 2024 ทีมวิจัยและพัฒนา QQlink สงวนลิขสิทธิ์