High Performance Computing 1
Matrix Multiply blocked
•Consider A,B,C to be N by N matrices of b by b subblocks where b=n/N is called the blocksize
•for i = 1 to N
•    for j = 1 to N
•       read block C(i,j) into fast memory
•       for k = 1 to N
•                  read block A(i,k) into fast memory
•                  read block B(k,j) into fast memory
•                  C(i,j) = C(i,j) + A(i,k) * B(k,j) {do a matrix multiply on blocks}
•      write block C(i,j) back to slow memory
*