Write CUDA. Run Everywhere.

Your CUDA skills are now universal. SCALE compiles your unmodified applications to run natively on any accelerator, ending the nightmare of maintaining multiple codebases.

Your Existing Code (NVIDIA)

nvcc my_app.cu -o my_app_nvidia

With SCALE (On Any Accelerator)

nvcc my_app.cu -o my_app_portable

View SCALE Tests

From CUDA Source to Native Code

True Compilation, Not Emulation

SCALE compiles CUDA source code directly to native machine instructions for non-NVIDIA GPUs, delivering native performance with no intrinsic overhead.

Your *.cu
Source

SCALE
Compiler (`nvcc`)

Built on LLVM to leverage existing vendor backends

Read Docs

Native AMD Machine Code

Native Intel Machine Code (Future)

Native $AI_Accelerator Code

Competitive Advantage

Why SCALE Over Other Solutions?

	Our Approach:	Auto Source-to-Source: HIPIFY	Alternative Languages: OpenCL
Codebase	Single CUDA codebase	Two+ Codebases to maintain	Complete rewrite needed
Process	Direct Compilation	Fragile Source Translation	New Language, New Ecosystem
Result	“Just make CUDA work”	A “compatibility tax” on developers	Abandons existing CUDA investment

Benchmarks

Native Performance on AMD Hardware

SCALE often outperforms existing solutions, and we're just getting started.

9.4x

Up to faster than HIP

2.85x

Average speed up

8/11

Workloads faster

Rodinia Benchmarks: SCALE vs HIP

Speed-up over HIP on AMD Instinct MI300x

Key Features

Fixing Common PTX Pitfalls

Inline PTX asm is common in CUDA programs, because it is the only way to access certain valuable features. However, NVIDIA's compiler provides virtually no validation for this part of the language. Since we have to parse it to compile it for AMD, we also provide proper warnings/errors, making this dark corner of the language much easier to work with.

Trivial Mistakes

Even trivial mistakes are a pain with NVCC:

Truncated Pointer

A common mistake is to pass a C++ pointer directly into a PTX asm block:

Multiple Definitions

A function that declares a PTX variable but is inlined repeatedly will cause strange errors due to the variable declaration being duplicated:

CUDA → SCALE Comparison

example.cu

__device__ int ptxAdd(int x, int y) {
    int out;
    asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
    return out;
}

SCALE

$nvcc example.cu -o example

error: missing semicolon in inline PTX
    4 |    asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
      |                         ^

nvcc

$nvcc example.cu -o example

ptxas /tmp/tmpxft_001e4e3c_00000000-6_add.ptx, line 28; fatal   : Parsing error near 'st': syntax error
ptxas fatal   : Ptx assembly aborted due to errors

View more code samples in our Documentation

Key Features

Compiler Feedback You'll Actually Love

Get clear, actionable diagnostics that help you pinpoint issues faster. If you've ever been stumped by a cryptic nvcc error, we're sorry and we feel you.

CUDA → SCALE Comparison

example.cu

#include <cstdio>
__global__ void hello() {
    printf("Hello, world\n");
}

int main() {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, 0);

    printf("CUDA Device: %s\n", prop.name);

    hello<<<1,1>>>();
    cudaDeviceSynchronize();
}

SCALE

$nvcc example.cu -o example

deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    9 |     cudaGetDeviceProperties(&prop, 0);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   14 |     cudaDeviceSynchronize();
      |     ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for gfx90a.
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    9 |     cudaGetDeviceProperties(&prop, 0);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   14 |     cudaDeviceSynchronize();
      |     ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for host.

$./example

CUDA Device: AMD Instinct MI210 - gfx90a (AMD) <amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack->
Hello, world

nvcc

$nvcc example.cu -o example

nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

$./example

CUDA Device: NVIDIA GeForce RTX 3080 Ti
Hello, world

View more code samples in our Documentation

Licensing

Free for non-commercial purposes.

Paid license for commercial use; available for design partnership and support.

Free

Paid

Research & Non-Commercial

For non-commercial, educational, and research purposes on all client, workstation and data-center GPUs.

Get Started

Commercial

Standard license for commercial deployment and use. Contact us for pricing.

Contact Sales

Enterprise

Collaborate with our team on custom solutions, optimizations, dedicated support, and roadmap prioritization.

Contact Sales

Unlock Your Potential with SCALE

Discover how our innovative solutions can transform your workflow and enhance your productivity today.

Get Started Read Docs

SCALE's Blog

Mon Jun 16 2025

Running GPU-Optimised Monte Carlo (GOMC) on an AMD GPU using SCALE

Look at the experience of using SCALE to build and run an existing CUDA-enabled project called GOMC.

AMD
Developer Experience
CUDA

Socials

We're also on other platforms. Connect with us everywhere else.

SCALE Community

Join us on our Discord server: Chat with the team, get help, and see what others are building.

Join the discussion

r/CUDAUnlocked

A community dedicated to running CUDA code on any GPU and accelerated platforms.

Join the subreddit

@SpectralCom

Our Professional Hub

Follow our page for official company news, industry insights and career opportunities at the forefront of hardware freedom