What You'll Do:
At Criteo, our Platform Core teams build the foundational services that power our global advertising platform.
We design and operate scalable, resilient systems that support real-time decision-making and data processing at massive scale.
As we expand our capabilities in high-performance inference and distributed computing, we’re forming a new team focused on GPU-powered services and cutting-edge ML serving technologies.
What You'll Do
As a Site Reliability Engineer in this new team, you’ll be at the forefront of building and operating GPU-powered services for machine learning workloads.
Your mission will be to ensure the reliability, scalability, and performance of our systems that leverage:
Ray: You’ll manage on-demand provisioning of Ray clusters on Kubernetes, enabling scalable distributed computing as a service for ML training and inference.
You’ll design, maintain, and monitor these ray-as-a-service systems, and deliver these capabilities as robust, self-service platform offerings.
Nvidia Triton Inference Server: You’ll optimize and operate high-performance inference services using Triton, ensuring low-latency and high-throughput serving of deep learning models.
You’ll work closely with ML engineers, data scientists, and other infrastructure teams to deliver production-grade services that accelerate innovation across Criteo.
Who You Are:
Strong experience with Kubernetes, especially in dynamic provisioning and custom operators.
Hands-on experience with GPU workloads, ideally in ML training or inference contexts.
Solid programming skills in C#, Python, Go, or similar languages.
Passion for automation, observability, and building reliable services.
Bonus Points
Familiarity with Ray or other distributed computing frameworks.
Knowledge of Nvidia Triton, TensorRT, or similar inference serving technologies.
Familiarity with cloud-native GPU orchestration (e.g., GKE, EKS, or on-prem equivalents).
Take a look at for access and insight into our engineering culture and achievements.
We understand that you might not meet each of the outlined requirements listed above or may have experience that is a little different from our specifications.
If you think that you can still bring value to the role, we want to hear from you.