Performance optimization for SpMV on multi-GPU systems using threads and multiple streams
Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineering ap-plications. This paper presents an optimization strategy to improve SpMV performance on the multi-GPU systems by adopting OpenMP threads and multiple CUDA streams. We propose an efficient scheme to control multiple GPUs jointly complete SpMV computations by making use of OpenMP threads. Moreover, we adopt streamed approach to increase concurrency to further improve SpMV performance. In our paper, we use HYB (Hybrid ELL/COO), a hybrid sparse storage format, to demonstrate the effectiveness of our proposed approach. Our experimental results show that our approach achieves an average speedup of 3.80 over the existing SpMV implementation on a single GPU.
Proceedings - 28th IEEE International Symposium on Computer Architecture and High Performance Computing Workshops, SBAC-PADW 2016
First Page Number
Last Page Number
Guo, Ping and Zhang, Changjiang, "Performance optimization for SpMV on multi-GPU systems using threads and multiple streams" (2017). Kean Publications. 1640.