About exxa
exxa's mission: Empower businesses and people to run frontier AI models on hardware they own.
We're democratizing access to state-of-the-art generative AI through self-hosted solutions that free companies and individuals from cloud dependencies. Whether it's your AI servers, laptop, or edge device: we're making high-level intelligence assistants accessible on hardware you control.
To achieve this, we're building a world-class research team to push the boundaries of efficient deployment through advanced model compression, novel architectures optimized for constrained environments, and cutting-edge inference acceleration. We're not just making AI smaller and faster, we're designing it to efficiently leverage hardware resources while maintaining frontier-level capabilities.
If you're excited to work in a high-pace environment where your research could directly enable AI ownership for millions, we'd love to hear from you!
About the internship
- Duration: 6 months
- Location: Paris/Remote (European timezones)
- Compensation: Competitive internship with potential for full-time conversion
- Autonomy: You will be expected to work autonomously on your work projects, under the supervision of exxa CTO (Etienne Balit)
- Academic collaboration: Opportunity to continue research through CIFRE programs.
What you'll do
- Research state-of-the-art algorithms for speculative decoding and quantized inference
- Train draft models tailored for speculative decoding
- Develop benchmark and evaluation metrics for draft model quality, and speculative decoding efficiency
- Contribute to publications and open-source projects like vLLM and SGLang
- Contribute to design novel model architectures optimized for constrained hardware
About you
Required qualifications:
- Currently pursuing or recently completed Master in Computer Science, Machine Learning, or related field from a top-tier engineering school/university. We will prioritize candidates in their final Master year.
- Strong mathematical foundation and programming skills. Preferably in Python, with experience in PyTorch, JAX, or similar ML frameworks
- Proven ability to implement complex algorithms and conduct rigorous experimental validation
- Strong understanding of transformer and LLM architectures and training techniques
Preferred qualifications:
- Experience in LLMs and/or VLMs
- Knowledge of speculative decoding or other compression techniques
- Experience with HPC infrastructure and distributed training/inference systems
- Familiarity with inference frameworks (vLLM, SGLang, TensorRT-LLM, etc.)
Why you should join us
Join us to bring AI full ownership to everyone! Work on frontier research in efficient model deployment, and solve hard problems with a fun, collaborative team. We're backed by top-tier VCs and offer access to advanced hardware in a low-ego environment where your work makes a real impact.
Application process
To apply, please send an email to careers@withexxa.com with your CV, and everything you think is relevant to your application (e.g. past projects, publications, motivation letter, etc.)