Dependability benchmarking

Building up on the experience gained in the long term efforts devoted to the experimental evaluation of fault-tolerant computer systems and to the characterization of the failure modes of off-the-shelf software components (commercial or open source), our work related to dependability benchmarking has been intensively developed in the framework of the European project DBench [renvoi http://www.laas.fr/DBench]. It aims to assess the robustness of operating systems (OSs) with respect to system calls corrupted at the Application Programming Interface (API) or at the Driver Programming Interface (DPI), as shown in Figure 1.

Figure 1: The operating system kernel and its environment

Corruption of system calls at the API level simulates application failures that result in erroneous requests to the services of the OS [Kanoun et al. 2005, Kanoun et al. 2008]. Besides traditional robustness measures (error code returned, exception raised, kernel hung, application aborted, etc.) the benchmark also includes timing aspects (i.e., OS response time in the presence of faults, OS restart time after fault activation). We have developed a testbed framework suitable for OS dependability characterization [Kalakech 2005]. The framework was in particular used to compare the reliability of three generations of the Windows family (NT4, 2000 and XP). As an example of results, Figure 2 gives the OS restart times in the presence of faults.

 


Figure 2: The operating system restart times in the presence of faults

Analogously to the API, we have introduced the notion of DPI that gathers the set of kernel functions that are available to the programming of the drivers. Such a DPI thus constitutes once more a suitable interface for simulating the potential erroneous behaviors induced by faulty drivers, to which are attributed a large proportion of OS failures. Here again, calls to kernel services are being corrupted on the fly [Albinet et al. 2004, Albinet et al. 2008-a]. In particular, The benchmarking framework that we have developed has allowed us to carry out a detailed study of the basic services (functions) provided to the drivers by the Linux kernel (e.g., see Figure 3) [Albinet 2005, Albinet et al. 2008b]. Such insights are very much useful in order to identify the most vulnerable ones in order to focus the definition and integration of relevant protection mechanisms. Other practical analyses have concerned the assessment of the behavior of the kernel when considering differing end-user viewpoints, e.g., responsiveness (the kernel reacts with explicit error notifications), availability (the kernel does not hang), safety (the service provided by the application processes is not corrupted).

Figure 3: Distribution of the observations in presence of faulty drivers

Publications

[Albinet et al. 2004] A. Albinet, J. Arlat, J.-C. Fabre, “Characterization of the Impact of Faulty Drivers on the Robustness of the Linux Kernel,” in Proc. IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN-2004), Florence, Italy, 2004, pp. 867-876.

[Albinet 2005] A. Albinet, Dependability Characterization of Operating Systems in presence of Faulty Drivers, PhD Dissertation, National Polytechnic Institute, Toulouse, March 2005. (In French)

[Albinet et al. 2008a] A. Albinet, J. Arlat, J.-C. Fabre, “Benchmarking the Impact of Faulty Drivers: Application to the Linux Kernel,” in Dependability Benchmarking for Computer Systems (K. Kanoun and L. Spainhower, Eds.), pp. 285-310, IEEE CS Press and Wiley, 2008.

[Albinet et al. 2008b] A. Albinet, J. Arlat, J.-C. Fabre, “Robustness of Software Executives: Characterization of the Impact of Driver Malfunctions by Fault Injection,” Tech. et Sc. Informatiques, vol. 27, no. n°9-10, pp. 1253-1286, 2008. (In French)

[Kalakech 2005] A. Kalakech, “Dependability Benchmarking of Operating systems: Specifications and Implementation”, LAAS report No 05336, PhD Dissertation, National Polytechnic Institute, Toulouse, June 2005.

[Kanoun et al. 2005], K. Kanoun, Y. Crouzet, A. Kalakech, A.E. Rugina, Ph. Rumeau, Benchmarking the Dependability of Windows and Linux using PostMark Workloads”, in Proc. 16th IEEE International Symposium on Software Reliability Engineering, ISSRE 2005, Chicago, Illinois, 8-11 novembre 2005, pp. 11-20.

[Kanoun et al. 2008], K. Kanoun, Y. Crouzet, A. Kalakech, A.E. Rugina, Windows and Linux Robustness Benchmarks With Respect to Application Erroneous Behavior”, in Dependability Benchmarking for Computer Systems (K. Kanoun and L. Spainhower, Eds.), pp. 277-254, IEEE CS Press and Wiley, 2008.