Part 2 of a series on why Spectral and SCALE exists.

In Part 1, I argued that cross-vendor portability in accelerated computing must be delivered by a company, rather than a committee, because the implementation is the standard. A reasonable reader would finish that argument by asking the obvious follow-up: fine, but what happens when the implementation gets written by an LLM?

It's a timely question. If agents can write CUDA, the skeptic says, then who needs a CUDA toolchain that runs everywhere? Just point the model at each backend and have it emit native code. CUDA here, ROCm there, SYCL there, etc. The compiler becomes a quaint historical artifact — the assembly programmer of the 2030s.

Let us take that view seriously, because at first hearing it sounds like it dissolves the whole thesis behind Spectral. It doesn't. The opposite, in fact: agentic code generation makes the compiler and runtime stack more valuable, not less. Here's why.

LLMs and compilers do categorically different jobs

A compiler is a deterministic function from source to machine code. Same input, same output, every time, forever. That property is not incidental. It's the entire reason you can trust a billion lines of code running on hardware you've never seen.

An LLM is a probabilistic function from context to plausible tokens. Same input, different output, depending on temperature, sampling, model version, and the phase of the moon. That property is also not incidental — it's what makes the model useful for ideation, exploration, and synthesis. But it is exactly the wrong property at the metal layer.

Nobody wants merely stochastic correctness on FMAs, memory fences, or atomics. Nobody wants a kernel which produces subtly different results across runs because the model got creative about which warp did the reduction. Think of the LLM as the brain and the compiler as the hammer: the brain decides what to compute, the hammer forges the runtime executable, the same way, every time.

These two things look superficially similar — both produce code — but they sit on opposite sides of a categorical divide. Conflating them is the error.

What agents actually need from a substrate

Once you accept that the agent and the compiler do different jobs, a useful question becomes: what does the agent need from the substrate — the compiler, runtime, and semantics it writes against — in order to be productive?

Three things, mostly.

First, fast and structured feedback. An agent iterating on a kernel needs compile errors it can parse, deterministic failure modes, and reproducible builds. The faster and more legible the feedback loop, the fewer attempts the agent will burn through before it lands on something which works. A toolchain emitting a wall of cryptic template errors is, for an agent, roughly as useful as no toolchain at all.

Second, one mental model rather than N (one for each vendor/GPU target). Every fragmentation of the substrate fragments the training distribution underneath the agent and multiplies the surface area for hallucination. If the agent has to remember that intrinsics are named one thing on NVIDIA, another on AMD, a third on Intel, and that the memory model has subtly different guarantees on each — its effective competence drops on every one of them. The model that deeply knows one substrate beats the model that knows N substrates shallowly.

Third, an environment that doesn't lie. The agent's productivity is bounded by how quickly it can iterate against something whose behaviour matches its documentation. A toolchain with silent miscompiles, undocumented edge cases, or platform-specific behaviour only showing up at runtime is a productivity sink whether the developer is human or not. Arguably more so when the developer is an agent, because the agent has no intuition for "that smells wrong, let me check."

So, not so surprisingly, these three things are also pre-requisites for human developer productivity.

Volume changes the calculus

Here's the part that I think gets underweighted in conversations about agentic coding.

Human-authored GPU code is, in the grand scheme of things, a small corpus: tens of thousands of serious kernels, written mostly by specialists, reviewed mostly by other specialists. The substrate underneath those kernels can afford to be sharp-edged, because the people using it know where the edges are.

Agent-authored GPU code presents a fundamentally different volume of output. Potentially millions of kernels, increasingly directed by people who aren't GPU specialists and shouldn't have to be. The person building a vertical AI product is thinking about what their inference path should do. Whether memory accesses coalesce is precisely the kind of detail the stack should handle for them.

In that world, the substrate isn't a convenience. It's the load-bearing element. Without a robust compiler and runtime underneath the agent, you don't have 10x engineering output — you 10x the number of failed attempts it takes to get anything working. The thing that turns "generated code" into "running code" at scale is the toolchain. Take it away and the agent's apparent productivity collapses into a pile of broken kernels nobody can fix, generated by an agent that burned through a small fortune in inference tokens trying to fix them itself.

CUDA is the substrate agents already know

Of all the GPU programming substrates in the world, CUDA has by a wide margin the largest public corpus of kernels, documentation, error messages, blog posts, and an accumulated archaeology of two decades of public internet Q&A. It also has the most stable semantics over the longest window. When you ask a frontier model to write a GPU kernel, it will be best at CUDA, by a considerable margin.

The "just emit native code for each backend" plan throws this away. It asks the model to do its worst-performing thing on every backend other than NVIDIA's, and it fragments the training distribution it does have. The agent ends up worse at all of them than it would be at the one.

The alternative is letting the agent write the substrate it actually knows well and let the compiler handle the targeting. The agent writes CUDA. SCALE makes it run on whatever's underneath. The agent's competence stays concentrated. The silicon stays interchangeable.

This isn't an argument about loyalty to NVIDIA's ISA, rather it's an argument about training corpora and semantic stability. If the world settled on SYCL tomorrow and had the models been trained on a decade of SYCL kernels, the argument would point at SYCL. It does not, because they were not.

The compiler and runtime workload is growing

Step back from the agent question for a moment to look at the compiler and runtime stack as a workload. It is not shrinking. What people want from it now includes, but is not limited to:

Structured diagnostics designed to be consumed by an LLM rather than read by a human — schema-stable, machine-parseable, dense with the information an agent needs to fix the bug.
Incremental builds and tight iteration loops that make exploration cheap.
Autotuning surfaces the agent can drive end-to-end.
Profile-guided feedback the agent can act on.
Runtime introspection that exposes what's actually happening on the device in a form the agent can reason about.

Every one of these is a compiler-and-runtime problem. None of them get smaller as agents get better. They get larger, because the agent is now a first-class consumer of everything the toolchain emits, and the toolchain has to be designed accordingly.

The "compilers are obsolete" framing misses this part of the picture entirely. The agent does not replace the compiler. Instead the agent leans harder on it.

Closing

Agentic AI doesn't make the compiler obsolete. It raises its leverage. The companies to win the agent era will be the ones whose substrate agents can target without ever thinking about the hardware underneath. Stable semantics, structured diagnostics, deterministic behaviour, one mental model that works everywhere — these are the properties that make agents productive, and every one of them is a compiler property.

The brain decides what to compute. The hammer forges the code to make it run.

With SCALE we are building the hammer and anvil to help you shape the intelligence of the future.