|
1
|
|
|
2
|
- flops – floating operations per second
- mega, giga, tera, peta
- often taken as aggregate ops of all the processors of a system
- technology of HPC machines is in the low teraflops regime
- workstations in the gigaflops range
- aggregate flops misleading measure
|
|
3
|
- clockspeed of chips is very fast
- most chips can perform 2-4 ops per second and can usually do some other
arithmetic
- but the chips can’t deliver data to the processors fast enough to keep
up with clock
|
|
4
|
- for example, pipelining
- break up data into several sub-operations that execute in different
sub-units in one cycle
- vector pipelining used to be the mark of supercomputers
- commodity clusters replaced vector machines; but making a comeback
|
|
5
|
|
|
6
|
- Almost all current computers use some pipelining e.g. IBM RS6000
- Speedup of instruction pipelining cannot always be achieved
- Next instruction may not be known till execution - e.g. branch
- Data for execution may not be available
|
|
7
|
- reduce clock cycle
- pipelining
- internal parallelism
- external parallelism (multiple processors)
|
|
8
|
- SISD
- single instruction, single data. this is the traditional von Neumann
model of computing – that is, a basic serial machine
- SIMD
- single instruction, multiple data. classical vector supercomputers, and
the old Connection Machines
|
|
9
|
- MISD
- multiple instruction, single data. no current hpc machine based on this
principal
- MIMD
- multiple instruction, multiple data. usual parallelism model
- could be shared memory, or distributed memory
|
|
10
|
- Use multiple processors
- Shared Memory (SMP: Symmetric Multi-processors)
- many processors accessing the same memory
- limited by memory-processors bandwidth
- SUN Ultra2, SGI Origin, Compaq
|
|
11
|
- Distributed memory
- many processors each with local memory and some type of high speed
interconnect
|
|
12
|
- multiple nodes, each with several processors that share memory locally
- best – and worst – of both models
- programming difficulties
|
|
13
|
- SPMD (Single Program Multiple Data)
- single program is run on all processors with different data
- each processor knows its ID -- thus
- if(proc ID .eq. N) then
- Else
- Constructs can be used for program control
|
|
14
|
- MPMD (Multiple Program Multiple Data)
- Different programs run on different processors
- often a master-slave model is used
|
|
15
|
|
|
16
|
|
|
17
|
- ·¾·¾·¾·
-
·¾·¾·¾·
-
·¾·¾·¾·
-
·¾·¾·¾·
|
|
18
|
- Processor
- Registers
- Cache (multiple levels)
- Memory
- Disk
|
|
19
|
- registers O(1) clock cycle bytes
- L1 cache O(10) cycles Kbytes
- L2 cache O(20) cycles Mbytes
- Memory O(100) cycles Gbytes
- Disk O(1000) cycles Tbytes
|
|
20
|
- Intelligent use of cache
- Smart units pre-fetch data
- multi-tasking to hide latency
|