What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture

Seminar by Prashant Mehta, Professor, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, USA

Séminaire

16.07.26 - 16.07.26

mac / do

Transformer is the name of the core algorithm inside a large language model (LLM). In the so-called decoder-only transformer, a finite sequence of symbols (tokens) is mapped to the conditional probability of the next token.

In this talk, I situate the transformer within the broader history of the prediction theory: In the early 1940s, Wiener introduced a linear predictor, where the conditional expectation of future data is computed by linearly combining the past data. I argue that a decoder-only transformer generalizes this idea and that a transformer is best understood as a causal nonlinear predictor. The technical results for causal nonlinear prediction are described for the special case where the data is discrete-valued and generated from an underlying hidden Markov model (HMM).

The aim of this on-going research is to bridge the classical nonlinear filtering theory with modern inference architectures inspired by transformers. The work is jointly carried out with Heng-Sheng Chang and Jin Won Kim, and the talk is based on the following papers: https://arxiv.org/abs/2605.15608 and https://arxiv.org/abs/2505.00818.

published on 27.06.26