Augmenting Internet measurements with metadata for a better Internet cartography

Omar Darwich PhD defense

Soutenance

04.05.26 - 04.05.26

Understanding the Internet at scale is fundamentally limited by partial visibility, decentralized control, and the heterogeneous nature of available measurement data. While a wide range of control plane and active measurement techniques exist, many Internet measurements lack the contextual information required for meaningful interpretation. This thesis addresses this challenge by focusing on the enrichment of existing Internet measurement data with additional semantic metadata that is not directly observable but is critical for understanding network behavior. The central objective of this thesis is to augment widely available Internet data—such as IP addresses, routing information, and interdomain paths—with inferred attributes that add geographic, operational, and traffic-related context. Rather than collecting entirely new measurements, this work develops methods that extract additional insight from partial and publicly accessible observations. The first contribution of the thesis focuses on IP geolocation. It presents a systematic approach for evaluating, comparing, and reproducing geolocation results using public data sources, addressing long-standing issues of inconsistency, lack of transparency, and limited ground truth. By improving the reproducibility of geolocation measurements, this contribution enables more reliable geographic analysis of Internet infrastructure and routing behavior. The second contribution addresses the limited visibility of interdomain traffic engineering. It develops techniques to infer traffic engineering actions from control plane routing data, despite the absence of explicit signaling in BGP. These methods enrich routing observations with evidence of policy-driven behavior, improving the interpretability of routing dynamics and supporting more accurate root cause analysis of routing changes. The third contribution focuses on estimating interdomain traffic volumes from limited measurement data. It proposes methods for augmenting routing information with traffic estimates, helping to bridge the gap between control plane visibility and actual data plane behavior. This enables a more complete view of how routing decisions affect traffic distribution across the Internet. Together, these contributions demonstrate that enriching Internet measurements with carefully inferred metadata significantly improves the scope, accuracy, and reproducibility of Internet analysis. By connecting geographic context, routing intent, and traffic behavior, this thesis provides a unified framework for studying the Internet across multiple layers, despite its inherent decentralization and limited observability.

published on 29.04.26