What Does It Actually Take to Build a Conversational AI Avatar That Works in a Live Public Environment?

Devesh May 20, 2026 ·32 writeups ·joined Jul 2025

3 min read

The question most brand teams ask is "Can we build something like this?" The better question is "What does it actually take to make it work in a real environment, with real people, at real scale?"

When IIC Lab deployed their conversational AI avatar for HDFC Securities' Scam 2025 campaign, the experience looked seamless. A visitor walked up, spoke to an avatar, and the avatar responded — shifting tone, adapting tactics, expressing genuine facial reactions — all within half a second. This piece walks through each technical layer that made that possible.

Why the Technical Foundation Matters More Than the Use Case

There is a pattern in experiential AI projects where the use case is compelling but the execution disappoints. The avatar responds with a noticeable lag. The voice sounds mechanical. The facial expression does not match the emotional content of the words. Each of these failures is a technical problem masquerading as a creative one.

Layer One — MetaHuman Character Development

Built using Unreal Engine's MetaHuman Creator, the process involved facial topology mapping, structural alignment to replicate jaw and brow proportions, wardrobe continuity with the campaign TVC, and lighting calibration for a transparent OLED display in a lit retail environment. The result was a character visitors recognised from the campaign — critical for brand recognition.

Layer Two — Voice Model Training

The voice model was trained to preserve cadence and rhythm across emotional registers, retain natural speech irregularities (pauses, emphasis, hesitations), replicate regional tonal characteristics for Indian audiences, and maintain consistency under mall acoustic conditions. Crucially, it was trained for conversation — which demands a different tonal range than broadcast delivery.

Layer Three — The Real-Time Conversational Pipeline

The full loop had to execute within 500ms: microphone capture → speech-to-text transcription → contextual language processing → voice synthesis → avatar animation on the OLED. The system ran on an RTX 5090, providing GPU capacity to handle real-time animation, voice synthesis, and language model inference simultaneously.

Deployment Results

2,000+ direct visitor interactions. 200,000+ impressions. 100% digital CTA completion. Personalised scam-resistance profiles for every participant. Among the earliest large-scale deployments of a real-time conversational MetaHuman avatar in a public retail environment in India.

IIC Lab's AI development team builds conversational avatars, real-time interaction pipelines, and experiential AI products for brands operating at scale.