Currently we are working on the following funded projects:
The demand for HPC resources is continuously rising, with AI being the main driver. However, high energy costs and variable renewable energy sources create serious challenges. In response, HPC providers may adapt their capacity dynamically, renewing the interest in malleable jobs, which can adjust their resources as needed. AI training workloads, with their well-defined checkpoints, are naturally malleable. This project develops scheduling algorithms for adaptive workloads on variable-capacity systems, using theoretical analysis and simulation to optimize distributed deep-learning efficiency.
Edge devices (e.g., autonomous vehicles, robots, and smart cameras) are often constrained by their small size and weight limits. Therefore, they typically use lightweight chips and multiple sensors for real-time environmental perception. Current solutions are often unreliable and require bulky processors, leading to high computational and energy costs. The Hessian Distr@l-4a project NEVEAI aims to enable advanced, reliable, and efficient AI on small edge devices by significantly reducing implementation costs and eliminating technical and regulatory complexities.
The diversity and complexity of domain-specific data structures introduced by the variety of subdisciplines in computer science pose a challenge for the scientific community to achieve reusability when storing research data. The NFDIxCS consortium aims to identify, define, and deploy services to store domain-specific data by applying the FAIR principles and storing research data alongside related metadata, its corresponding software, context, and execution instructions, assembling all services in a cooperative and interoperable infrastructure.
The Message Passing Interface (MPI) is a widely used standard for communication in high-performance computing. However, in some cases, MPI calls can incur significant overhead due to decision tree traversal, limiting application performance. This project aims to implement an open-source LLVM compiler pass that performs optimizations to reduce latency in MPI calls, allowing seamless integration into existing development workflows.
The capabilities of modern computer simulations have the potential to change the industrial product design process by optimizing a vast number of design parameters while saving resources in production. For the optimization of the process, however, substantial amounts of computing time on HPC systems are often necessary. In this project, we apply AI methods to improve and accelerate the determination of design parameters. At the same time, we use approximate and heterogeneous computing to reduce the processing time of the steps of the simulation process.
A national high-performance computing center for computational engineering. In this project, RWTH Aachen and TU Darmstadt join forces to combine their existing strengths in HPC applications, algorithms and methods, and the efficient use of HPC hardware. NHR4CES aims to create an HPC ecosystem combining best practices of HPC and research data management. The focus of NHR4CES will be on engineering and materials science, and engineering-oriented physics, chemistry, and medicine. Our contribution lies in the area of parallelism and performance.
Performance measurement is the key to understanding and ultimately improving the performance of HPC applications. Unfortunately, many HPC systems expose their jobs to substantial amounts of noise, leading to significant run-to-run variation. This makes performance measurements generally irreproducible, heavily complicating performance analysis and modeling. In this project, we develop methods and tools to make performance measurement and analysis more noise-resilient.