Notes
Slide Show
Outline
1
Communication, Speedup
and Scalability
2
Message Passing Time
  • To send k bytes
  • tcomm(k) = tstartup+ (h-1)tstart-hop+(k+k0)tsend+tblock


    • tstartup: Total time in setting up the communication
    •  tstart-hop :Time for switching each “hop” in wormhole routing
    • h: no. of hops       k: no. of bytes to transfer
    • k0 : extra header bytes that are also moved
    • tsend :time to actually transfer k byte
    • tblock : time used in blocked messages en route


3
Communication Model
  • Speed = k/tcomm
  • Actual << Theoretical hardware limit advertised
  • Consequences
    • Send messages in blocks -- avoid small single messages
    • Arrange data distributions to get nearest neighbor communications e.g. use ring shift with direct neighbors
4
Communication Model
  • Program with logical processor numbers


5
Communication Model
  • Latency Hiding: use asynchronous messaging to overlap communication and computation (MPI_ISEND,MPI_IRECV)
    • Domain decomposition in solving grid problems; Compute with      first and communicate those while computing



6
Amdahl’s Law
  • Consider the execution of a program on p processors - let the part q (0<q<1) of each operation be parallelized. Maximum speedup
    • spfalse = tk/tp  = k/ [(q/p) +(1-q)]
    • Indicates the rapid loss of speedup if parallel fraction is not high enough as p increases
    • To get 50% efficiency i.e. 256 on 512 q =0.998
7
Amdahl’s Law
8
Amdahl’s Law
  • Why False in speedup ?
    • Assumed that no. of ops are same for sequential and parallel -- usually algorithms and data structures are different
    • Did not account for parallelization cost -- communication and synchronization costs!
    • assumed that performance does not change for sequential/parallel code (diff. vector length ...)
9
Speeduphonest
  • sphon= t1 for best sequential algorithm, tp for real parallel algoithm
      • =  [t1..]/[...+hbas +php] (complex form -diff to use)
  • hp : communication time that depends on p
    • p --> infty
    • sphon -->0

10
Scalability
  • There is an optimal number of processors for each problem


  • Fixed problem size with increasing numbers of processors is a poor use of parallel machine
11
Scalability
  • Increasing problem size with increasing numbers of processors leads to better use of parallel machine


12
Scalability
  • Now let problem size m-->infty as p -->infty


13
Scalability
  • Thus scalability is the desired measure of a parallel algorithm/code and not speedup!


  • Scalability is achieved if the quantity
      • [hp*p/m] is constant or increases very slowly as p increases