AVS 2026 Secondary Banner

Poster Session C, Wednesday, May 20, 4:15 – 5:00 pm
Board 12

Gaze entropy as a dynamic metric of engagement in audiovisual displays with multiple talking faces

Katia Steinfeld1 (), Micah M. Murray1,2, David J. Lewkovicz3; 1Lausanne University Hospital and University of Lausanne, Switzerland, 2The Sense Innovation and Research Center, Switzerland, 3Child Study Center, Yale School of Medicine

Digital displays, such as virtual meeting rooms and classrooms, present users with multiple talking faces. Optimizing such displays requires metrics that can track user engagement to identify which audiovisual streams capture gaze. We analyzed eye-tracking data sampled at 60Hz from adults (n=37) and children (n=149) viewing a composite video of multiple talkers while hearing one utterance. Only one talker’s visual utterance was synchronized with the concurrent auditory speech stream, while the other talkers’ visual utterances were temporally desynchronized. Results showed that audiovisual synchrony was associated with a reduction in stationary and transition gaze entropy, reflecting increasingly structured sampling of the synchronized talker, while asynchronous scenes maintained higher entropy. Children exhibited persistently higher entropy even in the presence of audiovisual synchrony, indicating less structured engagement. Our findings suggest that gaze entropy provides a sensitive and computationally simple metric for quantifying the organization of gaze dynamics in digital displays involving multiple talkers. We argue that such metrics could be integrated into adaptive display systems to detect reductions in engagement and dynamically enhance talker salience by modulating audiovisual synchrony or layout, notably in virtual meetings and educational interfaces. These results also highlight the importance of considering developmental differences when designing audiovisual displays for educational use. More broadly, gaze entropy offers a practical tool for evaluating and optimizing user engagement in digital environments without requiring explicit tasks.

Acknowledgements: This work was done in part under the Multisensory Environments to study Longitudinal Development (MELD) consortium (https://lab. vanderbilt.edu/meld/), which is supported by an unrestricted gift from Reality Labs Research, a division of Meta.

Thank You to Our AVS 2026 Sponsors

Apple
Vision: Science to Applications
Centre for Vision Research