Unsupervised classification of coded hyperspectral data using statistical tests

Trung-Tin DINH PhD defense

17.12.25 - 17.12.25

Hyperspectral imaging simultaneously captures the spatial and spectral information of a scene, with each pixel containing a spectrum of several dozen to several hundred bands. This spectral richness enables a wide range of applications, many of which rely on spectral classification to distinguish and identify materials. Traditionally, hyperspectral data are acquired by scanning the cube, a process that requires long acquisition times and generates large data volumes. To address these limitations, compressed snapshot imagers have been developed, such as the DD-CASSI (Dual-Disperser Coded Aperture Snapshot Spectral Imager), which relies on a coded mask. Instead of producing a full hyperspectral cube, these instruments yield coded measurements in which each pixel contains a linear combination of spectral components determined by the mask and the system’s dispersion. The standard approach is to first reconstruct the complete cube and then apply classification methods, but this reconstruction is computationally demanding and may introduce artifacts. An alternative, explored in this thesis, is to operate directly on the coded data without a reconstruction step. Thus, this thesis proposes an unsupervised classification method based on coded data. The proposed method relies on the exploitation of spectro-spatial correlations through the Separability Assumption (SA) in homogeneous regions, introduced by Ardi (2020) in the context of hyperspectral reconstruction. More precisely, a hyperspectral image is considered to be decomposable into homogeneous regions, each characterized by a unique reference spectrum weighted by local intensity variations. In this thesis, this assumption is regarded as a simple model of intraclass spectral variability. To locally evaluate this assumption, statistical tests are applied to candidate regions. For this purpose, we assume that photon noise, classically modeled by a Poisson distribution, can be approximated by Gaussian noise, and we make use of gaussianity tests. Based on the Separability Assumption and gaussianity tests, I proposed an iterative unsupervised classification algorithm for coded data, named CHOUCROUTE, which operates in three steps : detection, growth, and fusion of homogeneous regions belonging to the same class. The proposed algorithm was evaluated on both synthetic and realistic hyperspectral scenes. On synthetic data, it provides coherent classifications that closely match the ground truth. On realistic data, however, evaluation is more challenging due to the limited reliability of the available annotations, which do not always capture the spectral complexity of the scenes and may bias the comparison. The study also includes an analysis of the sensitivity of the classification results to the algorithm’s parameter choices. These experiments highlight that uncertainties in the ground truth make the evaluation of classification methods particularly delicate.

published on 23.01.26