High Performance Computing 1
Matrix Multiply unblocked
•Number of slow memory references on unblocked matrix multiply
• m =  n^3    read each column of B  n times
•         + n^2    read each column of A once for each i
•         + 2*n^2 read and write each element of C once
•        =  n^3 + 3*n^2
•So q = f/m = (2*n^3)/(n^3 + 3*n^2)  
•        ~= 2 for large n, no improvement over matrix-vector multiply