High Performance Computing 1
Matrix Multiply blocked
•Number of slow memory references on blocked matrix multiply
• m =  N*n^2    read each block of B  N^3 times (N^3 * n/N * n/N)
•         + N*n^2    read each block of A  N^3 times
•         + 2*n^2     read and write each block of C
•=  (2*N + 2)*n^2
•So q = f/m = 2*n^3 / ((2*N + 2)*n^2)  
•        ~= n/N = b  for large n
•