We are atSC25 Conference in St Louis, Nov 16-21, 2025!
St. Louis, MO
Nov. 16-21
Booth #6552
Book your Meeting

Write CUDA. Run Everywhere.

Your CUDA skills are now universal. SCALE compiles your unmodified applications to run natively on any accelerator, ending the nightmare of maintaining multiple codebases.

Your Existing Code (NVIDIA)

nvcc my_app.cu -o my_app_nvidia
With SCALE (On Any Accelerator)

nvcc my_app.cu -o my_app_portable
From CUDA Source to Native Code

True Compilation, Not Emulation

SCALE compiles CUDA source code directly to native machine instructions for non-NVIDIA GPUs, delivering native performance with no intrinsic overhead.

Your *.cu
Source
SCALE
Compiler (`nvcc`)

Built on LLVM to leverage existing vendor backends

Read Docs
Native AMD Machine Code
Native Intel Machine Code (Future)
Native $AI_Accelerator Code
Competitive Advantage

Why SCALE Over Other Solutions?

Our Approach:Auto Source-to-Source:
HIPIFY
Alternative Languages:
OpenCL
Codebase
Single CUDA codebase
Two+ Codebases to maintain
Complete rewrite needed
Process
Direct Compilation
Fragile Source Translation
New Language, New Ecosystem
Result
“Just make CUDA work”
A “compatibility tax” on developers
Abandons existing CUDA investment
Benchmarks

Native Performance on AMD Hardware

SCALE often outperforms existing solutions, and we're just getting started.

9.4x
Up to faster than HIP
2.85x
Average speed up
8/11
Workloads faster

Rodinia Benchmarks: SCALE vs HIP

Speed-up over HIP on AMD Instinct MI300x

Key Features

Fixing Common PTX Pitfalls

Inline PTX asm is common in CUDA programs, because it is the only way to access certain valuable features. However, NVIDIA's compiler provides virtually no validation for this part of the language. Since we have to parse it to compile it for AMD, we also provide proper warnings/errors, making this dark corner of the language much easier to work with.

Trivial Mistakes

Even trivial mistakes are a pain with NVCC:

Truncated Pointer

A common mistake is to pass a C++ pointer directly into a PTX asm block:

Multiple Definitions

A function that declares a PTX variable but is inlined repeatedly will cause strange errors due to the variable declaration being duplicated:

CUDA → SCALE Comparison
example.cu
__device__ int ptxAdd(int x, int y) {
    int out;
    asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
    return out;
}
SCALE
$nvcc example.cu -o example
error: missing semicolon in inline PTX
    4 |    asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
      |                         ^
nvcc
$nvcc example.cu -o example
ptxas /tmp/tmpxft_001e4e3c_00000000-6_add.ptx, line 28; fatal   : Parsing error near 'st': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
View more code samples in our Documentation
Key Features

Compiler Feedback You'll Actually Love

Get clear, actionable diagnostics that help you pinpoint issues faster. If you've ever been stumped by a cryptic nvcc error, we're sorry and we feel you.

CUDA → SCALE Comparison
example.cu
#include <cstdio>
__global__ void hello() {
    printf("Hello, world\n");
}

int main() {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, 0);

    printf("CUDA Device: %s\n", prop.name);

    hello<<<1,1>>>();
    cudaDeviceSynchronize();
}
SCALE
$nvcc example.cu -o example
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    9 |     cudaGetDeviceProperties(&prop, 0);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   14 |     cudaDeviceSynchronize();
      |     ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for gfx90a.
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    9 |     cudaGetDeviceProperties(&prop, 0);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   14 |     cudaDeviceSynchronize();
      |     ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for host.
$./example
CUDA Device: AMD Instinct MI210 - gfx90a (AMD) <amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack->
Hello, world
nvcc
$nvcc example.cu -o example
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
$./example
CUDA Device: NVIDIA GeForce RTX 3080 Ti
Hello, world
View more code samples in our Documentation
Licensing

Free for non-commercial purposes.

Paid license for commercial use; available for design partnership and support.

Free

Paid

Research & Non-Commercial

For non-commercial, educational, and research purposes on all client, workstation and data-center GPUs.

Get Started

Commercial

Standard license for commercial deployment and use. Contact us for pricing.

Contact Sales

Enterprise

Collaborate with our team on custom solutions, optimizations, dedicated support, and roadmap prioritization.

Contact Sales

Unlock Your Potential with SCALE

Discover how our innovative solutions can transform your workflow and enhance your productivity today.

SCALE's Blog

Author's Profile Picture

Running GPU-Optimised Monte Carlo (GOMC) on an AMD GPU using SCALE

Look at the experience of using SCALE to build and run an existing CUDA-enabled project called GOMC.

  • AMD
  • Developer Experience
  • CUDA

Socials

We're also on other platforms. Connect with us everywhere else.

SCALE Community

Join us on our Discord server: Chat with the team, get help, and see what others are building.

Join the discussion

r/CUDAUnlocked

A community dedicated to running CUDA code on any GPU and accelerated platforms.

Join the subreddit

@SpectralCom

Follow us on X (formerly Twitter) for the latest updates, news, and insights from the SCALE team.

Follow us

Our Professional Hub

Follow our page for official company news, industry insights and career opportunities at the forefront of hardware freedom

Follow us