LLM Engineer (SFT/RLHF/Post-Training) at Achieve Group – Dubai

Achieve Group is seeking a highly specialised LLM Engineer focused on post-training techniques—including Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimisation (DPO)—for a cutting-edge remote opportunity based in Dubai. This is your chance to work on frontier language models at the 7B to 100B+ parameter scale.

About the Role

You’ll be embedded in the post-training pillar of a world-class AI research and engineering team, responsible for designing and running alignment pipelines that improve model behaviour, safety, and instruction-following. Day to day, this involves curating training data, running fine-tuning experiments, evaluating model outputs across benchmarks, and iterating rapidly on RLHF/DPO pipelines. You’ll have access to cutting-edge GPU infrastructure and collaborate with leading AI researchers.

Key Requirements

3+ years of hands-on experience in LLM training, fine-tuning, or alignment (SFT, RLHF, DPO, PPO)
Deep proficiency in Python, PyTorch, and large-scale distributed training frameworks (DeepSpeed, FSDP)
Experience training models at 7B–100B+ parameter scale on multi-GPU/multi-node clusters
Strong understanding of preference data collection, reward model training, and evaluation methodology
Publications or demonstrable applied work in LLM post-training is highly desirable

Why Dubai?

Dubai is fast becoming a global AI hub—with significant government investment in AI infrastructure and a booming tech ecosystem. This remote role lets you leverage Dubai’s tax-free income while working on some of the world’s most advanced AI systems.

Apply now: View full job listing on LinkedIn →