Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: PhD Position F/M Physically Grounded Video Generation.
France Jobs Expertini

Urgent! PhD Position F/M Physically-Grounded Video Generation Job Opening In Paris – Now Hiring INRIA

PhD Position F/M Physically Grounded Video Generation



Job description

Contexte et atouts du poste

The Phd will be done at Inria in the Willow research team.

Mission confiée

Short Overview of the PhD Project:
This PhD thesis aims to enhance the physical consistency of current video generation
models by exploring various techniques to inject physics awareness into them.
PhD Project Description:
The motivation for this PhD thesis is to address a critical limitation in current video
generation models: their lack of consistency with the laws of physics.

Although these models
are increasingly adept at generating high-quality content that can almost perfectly match
real-world scenes, their capabilities to effectively model the underlying laws governing
dynamic interactions remain limited [1,2,3,4,6].

Simple scenarios, such as object freefall, are
sufficient to demonstrate these limitations [3].

Improving these capabilities is a fundamental
step towards building more robust models that can function as true world simulators.
Proposed Research Directions:
Different approaches have been explored to overcome the aforementioned limitations.

Some
works integrate 3D geometry and dynamics awareness as critical elements for generating
physically plausible videos [7].

Another interesting approach is model-based simulation
guidance, where physics engine simulations are used as an intermediate step to guide the
video generation process [4].

Furthermore, we consider post-training techniques to be
particularly promising.

In [3], the authors present a two-stage post-training pipeline
consisting of self-supervised fine-tuning on high-quality data and an Object Reward
Optimization (ORO) phase.

In [5], a novel framework called VideoREPA is proposed, which
distills physics understanding from video foundational models into text-to-video generation
models by aligning token-level relations.
Building on this, a primary direction for our research is the use of reasoning-capable models,
such as Large Language Models (LLMs) or Vision-Language Models (VLMs), to create
physically grounded scene descriptions that can guide the video generation process.

We
hypothesize that this could be a direct way to transfer the reasoning capabilities of
understanding models to generative ones.

Different settings and formats for this guidance,
from free-form text to more structured inputs, will be explored.
Moreover, we aim to investigate post-training techniques based on physics-informed reward
methods, such as those presented in [3].

Given that this work focuses on the specific case of
object freefall, a logical first step is to extend this approach to more complex and diverse
physical scenarios.
During the PhD thesis, the initial research directions will be adapted based on the evolution
of the field and the insights obtained during experimentation.
Evaluation and Benchmarking:
Recent benchmarks such as VideoPhy-2 [1], Phy-World [2], and PISA [3] are valuable
resources for measuring our contributions.

However, a key part of this project will also
involve identifying the limitations of current benchmarks.

Consequently, designing novel
tasks and evaluation strategies to better assess physical plausibility presents an additional
opportunity for contribution for this PhD project.
References:
[1] VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video
Generation
H.

Bansal, C.

Peng, Y.

Bitton, R.

Goldenberg, A.

Grover, K.

W.

Chang
[2] How Far is Video Generation from World Model: A Physical Law Perspective
B.

Kang, Y.

Yue, R.

Lu, Z.

Lin, Y.

Zhao, K.

Wang, G.

Huang, J.

Feng
[3] PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by
Watching Stuff Drop
C.

Li, O.

Michel, X.

Pan, S.

Liu, M.

Roberts, S.

Xie
[4] PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
S.

Liu, Z.

Ren, S.

Gupta, S.

Wang
[5] VideoREPA: Learning Physics for Video Generation through Relational Alignment with
Foundation Models
X.

Zhang, J.

Liao, S.

Zhang, F.

Meng, X.

Wan, J.

Yan, Y.

Cheng
[6] MOTIONCRAsFT: Physics-based Zero-Shot Video Generation
L.

S.

Aira, A.

Montanaro, E.

Aiello, D.

Valsesia, E.

Magli
[7] Towards Physical Understanding in Video Generation: A 3D Point Regularization
Approach
Y.

Chen, J.

Cao, A.

Kag, V.

Goel, S.

Korolev, C.

Jiang, S.

Tulyakov, J.

Ren

Principales activités

Main activities:

Analyse and implement related work.
Design novel innovative solutions.
Write progress reports and papers.
Present work at conferences.

Compétences

Technical skills and level required : programming skills are required.


Languages : English and possibly French.


Relational skills : Good communication skills.

Avantages

  • Subsidized meals

  • Partial reimbursement of public transport costs

  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)

  • Possibility of teleworking and flexible organization of working hours

  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)

  • Social, cultural and sports events and activities

  • Access to vocational training

  • Social security coverage

  • Required Skill Profession

    Computer Occupations



    Your Complete Job Search Toolkit

    ✨ Smart • Intelligent • Private • Secure

    Start Using Our Tools

    Join thousands of professionals who've advanced their careers with our platform

    Rate or Report This Job
    If you feel this job is inaccurate or spam kindly report to us using below form.
    Please Note: This is NOT a job application form.


      Unlock Your PhD Position Potential: Insight & Career Growth Guide