Scientific Computing with GPUs

http://ppam.pl

Tutorial Organizers:

Dominik Goeddeke (Applied Mathematics, TU Dortmund, Germany)
Jakub Kurzak (EECS/ICL, University of Tennessee, USA)
Jan-Philipp Weiss (Karlsruhe Institute of Technology, Germany)

Tutorial Speakers:

Dominik Goeddeke
Jakub Kurzak
Jan-Philipp Weiss
Udeepta Bordoloi (HPC/GPU, AMD)
Tim Schroeder (NVIDIA)

Abstract:

GPUs are now an established platform for high-performance scientific computing, and a multitude of general and domain-specific programming environments, libraries and tools have emerged. The goal of this tutorial is to provide an overview of GPU Computing for mathematicians and computational scientists trying to harness the power of GPUs in their work. The first session assumes no prior knowledge and provides an overview of ready-to-use GPU-accelerated mathematical libraries that enable the use of GPUs quickly and effortlessly, and some important architectural aspects that make GPUs much faster than conventional CPUs. The second session introduces OpenCL, an open standard to program GPUs and multicore CPUs in a vendor- and hardware-independent way. Participants are invited to experiment with OpenCL during a hands-on coding session. The afternoon sessions target an intermediate to advanced audience. The NVIDIA CUDA ecosystem is presented, and performance tuning guidelines for AMD and NVIDIA hardware are explained. The tutorial concludes with two case studies covering parallelization and optimization aspects of dense and sparse linear algebra, and direct and iterative solvers.

The course presenters are experts on GPU Computing from academia and industry, and have presented papers and tutorials on the topic at various conferences over the past several years.

Preliminary Programme:

Session 1: Introduction and First Steps (75 minutes)

  • Welcome, Motivation, Introduction (15 minutes)
  • Ready-to-use GPU-Accelerated Mathematical Libraries (30 minutes)
  • GPU Architecture (30 minutes)

Break (15 minutes)

Session 2: Programming with OpenCL (105 minutes)

  • - Introduction to OpenCL (45 minutes)
  • - Practical OpenCL Programming, Hands-on lab room session (60 minutes)

Lunch break (60-90 minutes)

Session 3: Performance Tuning and Advanced Programming (90-120 minutes)

  • The CUDA Ecosystem (25-30 minutes)
  • Performance Tuning for NVIDIA GPUs (25-30 minutes)
  • Performance Tuning for AMD GPUs and CPUs (45-60 minutes)

Break (30 minutes)

Session 4: Case Studies: Mathematical Building Blocks (90+ minutes)

  • BACUGen - AxB CUDA Generator (and Autotuner) (45 minutes)
  • Sparse Linear Algebra and Iterative Solvers (45 minutes)
  • Summary, Feedback, Discussion