Write CUDA. Run Everywhere.
Your CUDA skills are now universal. SCALE compiles your unmodified applications to run natively on any accelerator, ending the nightmare of maintaining multiple codebases.
nvcc my_app.cu -o my_app_nvidianvcc my_app.cu -o my_app_portableTrue Compilation, Not Emulation
SCALE compiles CUDA source code directly to native machine instructions for non-NVIDIA GPUs, delivering native performance with no intrinsic overhead.
Source
Why SCALE Over Other Solutions?
| Our Approach: | Auto Source-to-Source: HIPIFY | Alternative Languages: OpenCL | |
|---|---|---|---|
Codebase | Single CUDA codebase | Two+ Codebases to maintain | Complete rewrite needed |
Process | Direct Compilation | Fragile Source Translation | New Language, New Ecosystem |
Result | “Just make CUDA work” | A “compatibility tax” on developers | Abandons existing CUDA investment |
Native Performance on AMD Hardware
SCALE often outperforms existing solutions, and we're just getting started.
Rodinia Benchmarks: SCALE vs HIP
Speed-up over HIP on AMD Instinct MI300x
Fixing Common PTX Pitfalls
Inline PTX asm is common in CUDA programs, because it is the only way to access certain valuable features. However, NVIDIA's compiler provides virtually no validation for this part of the language. Since we have to parse it to compile it for AMD, we also provide proper warnings/errors, making this dark corner of the language much easier to work with.
Trivial Mistakes
Even trivial mistakes are a pain with NVCC:
Truncated Pointer
A common mistake is to pass a C++ pointer directly into a PTX asm block:
Multiple Definitions
A function that declares a PTX variable but is inlined repeatedly will cause strange errors due to the variable declaration being duplicated:
__device__ int ptxAdd(int x, int y) {
int out;
asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
return out;
}
error: missing semicolon in inline PTX
4 | asm("add.u32 %0, %1, %2" : "=r"(out) : "r"(x), "r"(y));
| ^
ptxas /tmp/tmpxft_001e4e3c_00000000-6_add.ptx, line 28; fatal : Parsing error near 'st': syntax error ptxas fatal : Ptx assembly aborted due to errors
Compiler Feedback You'll Actually Love
Get clear, actionable diagnostics that help you pinpoint issues faster. If you've ever been stumped by a cryptic nvcc error, we're sorry and we feel you.
#include <cstdio>
__global__ void hello() {
printf("Hello, world\n");
}
int main() {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("CUDA Device: %s\n", prop.name);
hello<<<1,1>>>();
cudaDeviceSynchronize();
}
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
9 | cudaGetDeviceProperties(&prop, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
14 | cudaDeviceSynchronize();
| ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for gfx90a.
deviceinfo.cu:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
9 | cudaGetDeviceProperties(&prop, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
deviceinfo.cu:14:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
14 | cudaDeviceSynchronize();
| ^~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated when compiling for host.
CUDA Device: AMD Instinct MI210 - gfx90a (AMD) <amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack->
Hello, world
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
CUDA Device: NVIDIA GeForce RTX 3080 Ti Hello, world
Free for non-commercial purposes.
Paid license for commercial use; available for design partnership and support.
Free
Paid
Research & Non-Commercial
For non-commercial, educational, and research purposes on all client, workstation and data-center GPUs.
Commercial
Standard license for commercial deployment and use. Contact us for pricing.
Enterprise
Collaborate with our team on custom solutions, optimizations, dedicated support, and roadmap prioritization.
Unlock Your Potential with SCALE
Discover how our innovative solutions can transform your workflow and enhance your productivity today.
Running GPU-Optimised Monte Carlo (GOMC) on an AMD GPU using SCALE
Look at the experience of using SCALE to build and run an existing CUDA-enabled project called GOMC.
- AMD
- Developer Experience
- CUDA
Socials
We're also on other platforms. Connect with us everywhere else.
SCALE Community
Join us on our Discord server: Chat with the team, get help, and see what others are building.
Join the discussionr/CUDAUnlocked
A community dedicated to running CUDA code on any GPU and accelerated platforms.
Join the subreddit@SpectralCom
Follow us on X (formerly Twitter) for the latest updates, news, and insights from the SCALE team.
Follow usOur Professional Hub
Follow our page for official company news, industry insights and career opportunities at the forefront of hardware freedom
Follow us
!