On-line learning-based anomaly detection

This work focuses on the early detection of anomalies affecting services deployed in the cloud.


Cloud computing is an attractive technology insofar as it enables processing and storage capacities to be pooled, thus reducing costs for customers of such infrastructures. One of the challenges facing cloud service providers is to meet a wide range of operational reliability requirements, from users with heterogeneous demands, and applying to services with shared resources.Early detection of anomalies affecting the services deployed in these infrastructures is fundamental in this context.

As part of C. Sauvanaud's thesis and G. Silvestre's Post Doc, we have defined an online, generic cloud anomaly detection system, enabling rapid response to potential service level violations, with the aim of recovering the sources of failure.

The detection system processes system monitoring data from hypervisors and service virtual machines (VMs) using machine-learning classification models (supervised, unsupervised and hybrid). Anomalies that can be detected include errors, preliminary service violation symptoms and service violations. The detection performance of our system is evaluated using a cloud platform into which anomalies are artificially injected.

The effectiveness of our approach has been validated on two case studies[1]: a database management system (MongoDB) and a virtualized network function.

Fig10



[1] Sauvanaud C., Kaâniche M., Kanoun K. , Lazri K., Da Silva Silvestre G., Anomaly Detection and Diagnosis for Cloud services: Practical experiments and lessons learned, Journal of  Systems and Software, 139, pp. 84-106, 2018