Parallel Computation course projects
The first assignment implemented matrix multiplication on CPU using C language, with a focus on cache locality and SIMD acceleration. The blislab tutorial was referenced, achieving 100% performance based on the Blas benchmark. The second assignment implemented matrix multiplication on GPU using C++ and CUDA, referring to the CUTLASS algorithm. It achieved 80% performance compared to the CUblas benchmark library. The third assignment was implemented on Expanse, a high-performance computing cluster at the school, using MPI to distribute the matrix multiplication tasks to multiple processors.