AI Infrastructure Engineer at Dautom in Dubai | Artificial Intelligence Jobs

About the Role

The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructure to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps. This role will focus on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat OpenShift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across SDD’s Sovereign Cloud and hybrid/multi-cloud environments. The engineer will enable enterprise-grade AI adoption for 200+ government entities.

Key Responsibilities & Deliverables

Design and implement GPU-based compute clusters. Define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.

Fully operational GPU-based AI infrastructure. GPU Cluster Uptime and Performance Utilization. Reduction in Cost per Training/Inference Workload.

Key Responsibilities

Install, configure, and optimize core components: CUDA, cuDNN, NCCL, NVIDIA Drivers, and GPU Operators. Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).
High-availability architecture for all AI workloads. Complete documentation and runbooks.
OpenShift AI (RHODS) Management
Deploy, configure, and maintain the Red Hat OpenShift AI (RHODS) platform for multi-tenant use. Manage the integration of NVIDIA GPU Operator for efficient GPU scheduling and support Data Scientists with Notebooks, Training, and Inference Endpoints.

How to Apply

Apply Now on LinkedIn

More jobs at get9to5jobs.com