•Consider A,B,C to
be N by N matrices of b by b subblocks where b=n/N is called the blocksize
•for i = 1 to N
• for j = 1 to
N
• read block C(i,j) into fast memory
• for k = 1 to N
• read block
A(i,k) into fast memory
• read block
B(k,j) into fast memory
• C(i,j) =
C(i,j) + A(i,k) * B(k,j) {do a matrix multiply on blocks}
•
write block C(i,j) back to slow memory