The inherent parallelism in the training and deployment of neural networks makes them a prime candidate for parallel computing. Our research in this area targets both the optimization of neural networks and using them as a tool to understand and improve the performance of arbitrary (parallel) programs. Recently, we started developing tuning methods for deep neural networks, mainly targeting non-functional requirements such as inference speed, energy consumption, and network size under given accuracy constraints. Specific projects aim at efficient design-space exploration methods and the programmability, efficiency, and performance portability of low-level network operations. At the same time, we are using neural networks to support compiler optimization or improve our automatic performance-modeling tool chain.