High Performance Computing 1
Matrix Multiply - unblocked
•for i = 1 to n
•  read row i of A into fast memory
•   for j = 1 to n
•       read C(i,j) into fast memory
•       read column j of B into fast memory
•       for k = 1 to n
•           C(i,j) = C(i,j) + A(i,k) * B(k,j)
•       write C(i,j) back to slow memory
*