Write CUDA code. Run Everywhere.
Your CUDA skills are now universal. SCALE compiles your unmodified applications to run natively on any accelerator, ending the nightmare of maintaining multiple codebases.
nvcc my_app.cu -o my_app_nvidianvcc my_app.cu -o my_app_portableWhat is SCALE?
Decoupling Code from Silicon.
CPU developers don't rewrite their software for every new chip architecture—they simply recompile. SCALE brings this standard of portability to GPU computing.
It is a comprehensive toolkit—combining a cross-compiler, drop-in libraries, and language extensions—that acts as an agnostic interface between your code and the hardware.
With SCALE, you can take your existing HPC applications (starting with CUDA) and deploy them to your accelerated compute platform of choice.
Your CUDA codebase is the single source of truth with zero porting required.
Unlock new efficiency gains through advanced software optimization, not just hardware upgrades.
Access enhanced UX and language extensions built by HPC developers, for HPC developers.
Break vendor lock-in and choose hardware based on lower cost or higher performance.
True Compilation, Not Emulation
SCALE compiles CUDA source code directly to native machine instructions for GPUs, delivering native performance with no intrinsic overhead.
Source
Why SCALE Over Other Solutions?
| Our Approach: | Auto Source-to-Source: HIPIFY | Alternative Languages: OpenCL | |
|---|---|---|---|
Codebase | Single CUDA codebase | Two+ Codebases to maintain | Complete rewrite needed |
Process | Direct Compilation | Fragile Source Translation | New Language, New Ecosystem |
Result | “Just make CUDA work” | A “compatibility tax” on developers | Abandons existing CUDA investment |
Native Performance on AMD Hardware
SCALE often outperforms existing solutions, and we're just getting started.
Rodinia Benchmarks: SCALE vs HIP
Speed-up over HIP on AMD Instinct MI300x
Fixing Common PTX Pitfalls
Inline PTX asm is common in CUDA programs, because it is the only way to access certain valuable features. However, NVIDIA's compiler provides virtually no validation for this part of the language. Since we have to parse it to compile it for AMD, we also provide proper warnings/errors, making this dark corner of the language much easier to work with.
Trivial Mistakes
Even trivial mistakes are a pain with NVCC:
Truncated Pointer
A common mistake is to pass a C++ pointer directly into a PTX asm block:
Multiple Definitions
A function that declares a PTX variable but is inlined repeatedly will cause strange errors due to the variable declaration being duplicated:
__device__ int ptxAdd(int x, int y) {
int out;
asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
return out;
}
error: missing semicolon in inline PTX
4 | asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
| ^
ptxas /tmp/tmpxft_001e4e3c_00000000-6_add.ptx, line 28; fatal : Parsing error near 'st': syntax error ptxas fatal : Ptx assembly aborted due to errors
Compiler Feedback You'll Actually Love
Get clear, actionable diagnostics that help you pinpoint issues faster. If you've ever been stumped by a cryptic nvcc error, we're sorry and we feel you.
#include <cstdio>
__global__ void hello() {
printf("Hello, world\n");
}
int main() {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("CUDA Device: %s\n", prop.name);
hello<<<1,1>>>();
cudaDeviceSynchronize();
}
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
9 | cudaGetDeviceProperties(&prop, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
14 | cudaDeviceSynchronize();
| ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for gfx90a.
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
9 | cudaGetDeviceProperties(&prop, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
14 | cudaDeviceSynchronize();
| ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for host.
CUDA Device: AMD Instinct MI210 - gfx90a (AMD) <amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack->
Hello, world
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
CUDA Device: NVIDIA GeForce RTX 3080 Ti Hello, world
Free for non-commercial purposes.
Paid license for commercial use; available for design partnership and support.
Free
Paid
Research & Non-Commercial
For non-commercial, educational, and research purposes on all client, workstation and data-center GPUs.
Commercial
Standard license for commercial deployment and use. Contact us for pricing.
Enterprise
Collaborate with our team on custom solutions, optimizations, dedicated support, and roadmap prioritization.
Frequently Asked Questions
If this section doesn't answer your question, please check our FAQ section on the official documentation or reach out to us on our social-media.
SCALE is free for non-commercial use including research and academia. For commercial use, a license agreement is required. Read more here.
SCALE does not currently support Pytorch, but support is in development and estimated for release in early Q2 2026. For a detailed overview of the currently supported CUDA projects, see this table of our validation suite.
SCALE supports a wide range of both consumer and enterprise GPUs, and will support more in the future. For a detailed overview, see this section of the official SCALE documentation.
In many cases, yes, it does. Reducing compute costs can be a good reason to choose SCALE. For the latest performance benchmarks, see this section of our website.
SCALE is centered around CUDA and allows you write your code once, and run everywhere with zero code rewrite. It is a drop-in replacement for nvcc. For full explanation of all the differentiators of SCALE, see this section of our technical documentation.
By design, SCALE does not infringe NVIDIA’s EULAs or copyright. We think CUDA is amazing and we follow the guidelines set by NVIDIA. Check out this post for more information.
Unlock Your Potential with SCALE
Discover how our innovative solutions can transform your workflow and enhance your productivity today.
Running GPU-Optimised Monte Carlo (GOMC) on an AMD GPU using SCALE
Look at the experience of using SCALE to build and run an existing CUDA-enabled project called GOMC.
- AMD
- Developer Experience
- CUDA
Socials
We're also on other platforms. Connect with us everywhere else.
SCALE Community
Join us on our Discord server: Chat with the team, get help, and see what others are building.
Join the discussionr/CUDAUnlocked
A community dedicated to running CUDA code on any GPU and accelerated platforms.
Join the subreddit@SpectralCom
Follow us on X (formerly Twitter) for the latest updates, news, and insights from the SCALE team.
Follow usOur Professional Hub
Follow our page for official company news, industry insights and career opportunities at the forefront of hardware freedom
Follow us