High Performance Computing 1
A sense of speed – vector ops
Inner product
s=s+v1(i)*v2(i)
2
6
Bidiagonal
v1(i)=v2(i)-v3(i)*v1(i-1)
2
5
update+gather
v1(i)=v1(i)+s*v2(ind(i))
2
4
divide
v1(i)=v1(i)/v2(i)
1
3
4-fold vector update
v1(i)=v1(i)+Σ sk*vk(i)
8
2
update
v1(i)=v1(i)+a*v2(i)
2
1
operation
Operation per pass
Flops per pass
Loop