Multi-GPU implementation and performance optimization for CSR-based sparse matrix-vector multiplication
Sparse matrix-vector multiplication (SpMV) is a critical operation in scientific computing and engineering applications. CSR (Compressed Sparse Row) is the most popular sparse storage format and CSR-Based SpMV usually has good performance on sparse matrices with large number of non-zero elements. This paper presents our Multi-GPU SpMV implementation to improve CSR-Based SpMV performance. We make use of multiple GPUs to jointly complete SpMV computations and adopt streamed approach to increase concurrency to further improve SpMV performance. We evaluate performance of our Multi-GPU SpMV on a collection of fourteen sparse matrices and demonstrate the effectiveness of our proposed approach in performance improvement on a large-scale cluster. The average speedup achieved from our experiments is 6.68.
2017 3rd IEEE International Conference on Computer and Communications, ICCC 2017
First Page Number
Last Page Number
Guo, Ping and Zhang, Changjiang, "Multi-GPU implementation and performance optimization for CSR-based sparse matrix-vector multiplication" (2017). Kean Publications. 1600.