AI Inference Infrastructure Engineer
FAR Labs
Role details
About this role
01. ABOUT THE COMPANY Dizzaract is a product-driven company operating at the intersection of gaming, digital platforms, and AI. We build and scale multiple products — including Farcana 2.0, Gamed, and FAR Labs — each exploring a different space, but united by a shared approach: moving fast, staying curious, and focusing on things that people actually use. We operate as a collaborative, non-hierarchical team where ideas are valued based on their impact, not their origin, and where AI is embedded across everything we build — from infrastructure to product decisions. 02. ABOUT THE ROLE FAR Labs is building FAR AI, a distributed inference network that turns idle GPU capacity into usable compute for developers, businesses, and AI products. Around this infrastructure, we build a wider ecosystem of products, payment tools, developer platforms, and internal systems that help prove and grow the network through real usage. We’re looking for an AI Inference Infrastructure Engineer to build the core engine powering this system — from low-level runtime to distributed orchestration. This is a deeply technical role focused on performance, efficiency, and control over hardware. You will be working close to the metal, designing systems that operate at the limits of modern compute. 03. WHO YOU ARE Systems Engineer You have deep experience in systems programming and care about performance at every layer. Performance Optimiser You understand how to extract maximum efficiency from hardware — not just make things work. Infrastructure Builder You’ve built or contributed to distributed systems where latency and throughput matter. Low-Level Thinker You’re comfortable working close to hardware — memory, kernels, execution pipelines. Pragmatic Engineer You avoid unnecessary abstractions and focus on what actually improves performance. 04. RESPONSIBILITIES Core Engine Development Design and implement low-level runtime systems (Rust/C) to manage model execution, memory allocation, and request batching across distributed compute. Hardware-Aware Optimisation Develop and optimise tensor operations and custom kernels (CUDA, Triton) to maximise hardware utilisation. Inference Performance Implement advanced inference techniques such as continuous batching, speculative decoding, and paged attention — tailored to our architecture. Decentralised Orchestration Design routing and load balancing systems across geographically distributed nodes to ensure availability and efficiency. Autonomous Infrastructure Build fully automated deployment and recovery systems — ensuring zero manual intervention and self-healing network behaviour. 05. REQUIREMENTS Strong experience in systems programming (Go, Rust, or C/C++). Proven experience working with performance-critical systems. Deep understanding of GPU programming (CUDA, ROCm) and hardware-level optimisation. Strong knowledge of deep learning architectures (Transformers, Mamba) and tensor execution. Experience building distributed systems with high concurrency and low latency. Ability to work close to the hardware and optimise execution pipelines. Strong problem-solving skills and attention to performance bottlenecks. 06. WHAT WE OFFER Real ownership and direct impact on a global AI infrastructure product Fast execution, low bureaucracy, and a highly collaborative, idea-driven team Exposure to cutting-edge AI, distributed systems, and hardware-level optimisation Competitive salary with performance-based incentives 24 days annual leave, plus public holidays Health insurance Modern office in Yas Creative Hub Continuous learning through real-world problem solving — not just theory The opportunity to shape what you’re working on and influence product direction A diverse, open-minded team where ideas are genuinely heard