Dynamic Management of Supercomputer Resources

Job scheduling and resource management plays an essential role in high-performance computing. The resources of a supercomputer are usually managed by a batch system, which is responsible for the effective mapping of jobs onto resources (i.e., compute nodes in most cases). From the system perspective, a batch system must ensure high system utilization and throughput, while from the user perspective it must ensure fast response times and fairness when allocating resources across jobs. Traditional batch systems perform only static resource management, that is, they only support jobs with fixed resource requirements over their entire life cycle. However, this is not sufficient as HPC applications now often exhibit unpredictably changing resource requirements, for example, to accommodate expanding data structures or enable in-situ analyses for visualization. In general, changing requirements may refer to a wide range of resource types, including compute nodes and different classes of storage. Moreover, runtime systems are becoming more adaptive by nature to lower energy consumption and support fault tolerance. To allow the system to modify the resource set assigned to a job immediately when the need arises, we develop dynamic resource management techniques for batch systems, which we believe will be indispensable in the future.

Selected Publications

Contact

Technische Universität Darmstadt
Department of Computer Science

Laboratory for
Parallel Programming

Office: S4|14 3.1.11
Mornewegstraße 30
64293 Darmstadt

Phone:+49 6151 16-21636
Email:wolf@cs.tu-darmstadt.de
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang