At EXXA, we are building the most cost-efficient, high-throughput
AI infrastructure for large-scale, asynchronous workloads.
Our mission is to balance Gen-AI demand and processing supply by
leveraging idle GPUs, optimizing batch inference, and pushing AI
models' inference efficiency.
If you are passionate about open-source AI, obsessed with performance,
and love tackling complex technical challenges, we want to hear from you!
EXXA is hiring a Parallel Programming Expert (CUDA/AVX)
to join the team developing EXXA inference engine, focusing
on batch processing and throughput rather than
low-latency constraints.
Key responsibilities:
- Contribute to the development of EXXA inference engine
- Profile and optimize the inference engine
- Design and implement efficient inference kernels for GPU and CPU
- Benchmark and validate performance improvements
Qualifications:
- Proven expertise in parallel programming, using CUDA and/or SIMD instructions
- Experience with performance optimization and profiling
- Familiarity with Triton kernel language and/or MLIR/XLA intermediate representation is a plus
- Proficiency in C++ or Rust
- Knowledge of Python ML stack (PyTorch, HuggingFace, etc.)
- 2-3+ years of experience in high-performance computing or similar field
Why you should join us:
🚀 Technical innovation
-
We are tackling
massive technical challenges
to make Gen-AI inference infrastructure
more efficient and push throughput-optimized computing.
🌐 Remote first
-
We are a fully distributed team. Work from anywhere in European timezones.
💸 Competitive compensation and benefits
- Competitive salary
- Early-stage stock options
- Private health insurance
- 30+ paid holidays
- Top notch hardware and equipment
🙏 Backed by the best
-
We are funded by leading VCs and top business angels (announcement coming soon).
Any questions?
Even if you don't meet all the qualifications, we encourage you to apply.
Contact us if you have any questions at careers@withexxa.com.