Scalable algorithms, protocols and services

The aim of this work is to design algorithms for resource allocation and scheduling, as well as service discovery and composition, that will scale up to tomorrow's networks.


Coflow scheduling

Many parallel computing applications (e.g. MapReduce) consist of multiple computation steps. Intermediate results usually need to be transferred between compute servers at each stage. Often, a computational step cannot be started or completed until all intermediate results from the previous step have been received. This implies that, to optimize the performance of data transfers in a parallel application, it is more relevant to look at the termination time of all flows rather than that of each individual point-to-point flow. The concept of coflow has been introduced for this purpose. A coflow is defined as a collection of parallel data streams whose completion time is determined by the completion time of the last stream in the collection.

In the literature, the usual performance measure for coflow scheduling is the makespan or weighted coflow completion time (WTC). A decade of research into this problem has highlighted its complexity, and several algorithmic solutions have been developed. However, the context changes radically in the case of mission-critical parallel applications, where the data transfer phase may be subject to strict deadline constraints. In this context, our work aims to develop online solutions for joint admission control and scheduling of coflows with deadlines. The aim is to minimize the number of late coflows.

Monitoring / Distributed tracing in the cloud

Service orchestration is an automated process that manages the lifecycle of each component, and in particular its allocation to the various resources of the cloud infrastructure. While cloud technologies facilitate the development and deployment of applications, they hide the need for debugging and performance analysis. The detection of partial failures, bottlenecks or cyber attacks are new obstacles that have arisen with the adoption of cloud applications, and which, because of the abstractions made by the orchestration process, are extremely complex to correct. Typically, a very costly introspection of the application's code by teams of developers is required to determine the flaw.

The new distributed tracing paradigm is a lever enabling developers to partially lift the veil on the composition of services within a distributed application. This technology is now being increasingly adopted on many cloud-native architectures.