Scalable code generation and analysis RL environment scalable for large language models (LLMs) training at INRIA

Job Overview

Company

INRIA

Location

Villeneuve-d'Ascq

Ready to Apply?

Take the Next Step in Your Career

Join INRIA and advance your career in Computer Occupations

Apply for This Position

Click the button above to apply on our website

Job Description

Contexte et atouts du poste

You will work inside the MAGNET team at Inria Lille

More particularly, on the Reasoning Core project

Funded by ANR Adada, collaborating with the PI, a PhD student and an engineer

Mission confiée

The internship will focus on the design of a reinforcement learning environment for procedural task generation in code completion and understanding.

Unlike existing benchmarks that rely on static datasets, the objective is to create renewable, verifiable tasks with an adaptive difficulty knob and curriculum learning.

This will involve:

Studying existing environments for procedural reasoning and code generation, such as Reasoning Gym (Shen et al., Reasoning Gym: Procedurally Generated Reasoning Environments for Pretraining and Evaluating LLMs, 2024), DELTA-Code (Sun et al., DELTA-Code: A Benchmark for RL-based Algorithm Learning in Code LLMs, ICLR 2025), StepCoder (Dou et al., StepCoder: Improving LLM-Based Code Generation via Step-wise RL, 2024), and RLEF (Gehring et al., Reinforcement Learning from Execution Feedback for Code Generation, NeurIPS 2024).

Identifying gaps in existing approaches, in particular the lack of procedurally generated static analysis tasks (e.g., call-graph construction, error detection, performance profiling).

Proposing new generators for Python code tasks that produce infinite, verifiable problem instances, with evaluation grounded in unit tests, execution traces, or static analyzers.

Principales activités

Main activities

Implement procedural generators for code tasks (Python focus), inspired by TinyPy-style grammar synthesis (Naïr et al., Curriculum Learning for Code Execution with Synthetic Python Programs, 2023) and code perturbation methods

Integrate these generators in a Gym-style RL environment with automatic scoring and curriculum scheduling (easy → hard).

Explore a novel task family (e.g., static code reasoning, code completion, debugging, refactoring, performance analysis, or multi-file repository puzzles) and evaluate whether LLMs can improve through curriculum-based RL.

Compétences

Excellent python coding skill and code understanding

Critical thinking and scientific culture

Good vibe coding capabilities

Some perfectionism, liking iteration and refinment

Ability to process scientific literature

Avantages

Subsidized meals

Partial reimbursement of public transport costs

Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)

Possibility of teleworking and flexible organization of working hours

Professional equipment available (videoconferencing, loan of computer equipment, etc.)

Social, cultural and sports events and activities

Access to vocational training

Social security coverage

Rémunération

internship allowance according to the amount in force

About INRIA

Quick Access Links

Job Details:
https://fr.expertini.com/jobs/job/scalable-code-generation-and-analysis-rl-environment-scalable-for-large-language-models-llms-training-villeneuve-dascq-inria-9c5118371164/

Company Jobs:
More INRIA Jobs

Location Jobs:
Jobs in Villeneuve-d'Ascq

Category Jobs:
Computer Occupations Jobs

Don't Miss This Opportunity!

INRIA is actively hiring for this Scalable code generation and analysis RL environment scalable for large language models (LLMs) training position

Apply Now