Overt Visual Attention and Attribution Alignment in Transformer-Based Reading Models

Lingchen Kong¹, Jinnie Shin¹, Pavlo Antonenko¹; ¹University of Florida

Transformer-based language models rely on attention mechanisms that are often interpreted as cognitively meaningful, yet their correspondence to human visual attention during reading remains uncertain. This study examines how transformer attention and attribution-based explanation signals align with eye-tracking data on human visual attention and relevance judgments in a controlled, task-oriented reading setting. Using the ZuCo 2.0 corpus, we analyze eye-tracking data collected while readers performed a sentence-level semantic relation classification task. Human overt visual attention was operationalized using mean word-level gaze duration and human-annotated word importance judgments. Six pretrained transformer models (ALBERT, DistilBERT, BERT, RoBERTa, and DeBERTa base and large) were fine-tuned for the same task. Model attention weights were extracted from the first and last layers and aggregated to the word level, while attribution scores were computed using Integrated Gradients (IG), Leave-One-Out (LOO), and Local Interpretable Model-Agnostic Explanations (LIME). Alignment between model-derived signals and human measures was quantified using word-level Pearson correlations. Results show a positive correlation between transformer attention weights and human gaze duration, strongest in the first attention layer and most pronounced for BERT and DistilBERT (Pearson r up to 0.66), whereas RoBERTa exhibits negligible alignment and ALBERT shows stronger correspondence in its final layer. Attention-gaze correlations remain stable across training epochs despite performance gains, suggesting limited fine-tuning effects on perceptual alignment. In contrast, attribution methods—particularly IG and LOO—demonstrate stronger alignment with human word importance, especially for correctly classified and semantically explicit sentences, while LIME shows weak and unstable correspondence. Overall, the findings indicate that early-layer transformer attention partially reflects human visual salience during reading, whereas attribution methods more closely approximate human relevance judgments under correct model prediction. These results highlight a dissociation between perceptual and decision-level alignment and underscore that higher task accuracy alone does not guarantee cognitively grounded explanations.