Resource management in HPC and cloud environments

Resource management, whether on a supercomputer or in the cloud, is the task of assigning resources such as compute nodes or storage to jobs with the goal of maximizing resource utilization and fulfilling quality-of-service requirements.  A challenge arises from sometimes conflicting expectations of different stakeholders, including system providers, users, and application developers. Currently, we are trying to improve the balance between compute and I/O performance on future exascale systems, taking advantage of emerging storage technologies such as non-volatile memory and the dynamic allocation of compute and storage resources. In comparison to classic HPC, cloud applications, which often serve large numbers of users simultaneously, face the difficulty of highly unpredictable load conditions. Here, we focus on multi-objective scheduling of micro-services and auto-scaling of cloud-native applications.

Selected Publications