High Performance Computing 1
Superscalar Processor
•Large off chip Level 2 caches to help in data availability. L1 cache data is accessed in ~1 cycles while L2 cache is ~4 cycles and memory can be several tens times that!
–
•Efficiency directly related to reuse of data in cache
•
•Remedies:
–Blocked algorithms,
– contiguous storage,
– avoid strides and random/non-deterministic access
•
–