Performance optimization for SpMV on multi-GPU systems using threads and multiple streams

Document Type

Conference Proceeding

Publication Date

1-3-2017

Abstract

Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineering ap-plications. This paper presents an optimization strategy to improve SpMV performance on the multi-GPU systems by adopting OpenMP threads and multiple CUDA streams. We propose an efficient scheme to control multiple GPUs jointly complete SpMV computations by making use of OpenMP threads. Moreover, we adopt streamed approach to increase concurrency to further improve SpMV performance. In our paper, we use HYB (Hybrid ELL/COO), a hybrid sparse storage format, to demonstrate the effectiveness of our proposed approach. Our experimental results show that our approach achieves an average speedup of 3.80 over the existing SpMV implementation on a single GPU.

Publication Title

Proceedings - 28th IEEE International Symposium on Computer Architecture and High Performance Computing Workshops, SBAC-PADW 2016

First Page Number

67

Last Page Number

72

DOI

10.1109/SBAC-PADW.2016.20

This document is currently not available here.

Share

COinS