High Performance Computing 1
Matrix Multiply - unblocked
•
for i = 1 to n
•
read row i of A into fast memory
•
for j = 1 to n
•
read C(i,j) into fast memory
•
read column j of B into fast memory
•
for k = 1 to n
•
C(i,j) = C(i,j) + A(i,k) * B(k,j)
•
write C(i,j) back to slow memory
*