1	Communication, Speedup and Scalability
2	Message Passing Time To send k bytes t_comm(k) = t_startup+ (h-1)t_start-hop+(k+k₀)t_send+t_block t_startup: Total time in setting up the communication t_start-hop :Time for switching each “hop” in wormhole routing h: no. of hops k: no. of bytes to transfer k₀ : extra header bytes that are also moved t_send :time to actually transfer k byte t_block : time used in blocked messages en route
3	Communication Model Speed = k/t_comm Actual << Theoretical hardware limit advertised Consequences Send messages in blocks -- avoid small single messages Arrange data distributions to get nearest neighbor communications e.g. use ring shift with direct neighbors
4	Communication Model Program with logical processor numbers
5	Communication Model Latency Hiding: use asynchronous messaging to overlap communication and computation (MPI_ISEND,MPI_IRECV) Domain decomposition in solving grid problems; Compute with first and communicate those while computing
6	Amdahl’s Law Consider the execution of a program on p processors - let the part q (0<q<1) of each operation be parallelized. Maximum speedup sp_false = t_k/t_p = k/ [(q/p) +(1-q)] Indicates the rapid loss of speedup if parallel fraction is not high enough as p increases To get 50% efficiency i.e. 256 on 512 q =0.998
7	Amdahl’s Law
8	Amdahl’s Law Why False in speedup ? Assumed that no. of ops are same for sequential and parallel -- usually algorithms and data structures are different Did not account for parallelization cost -- communication and synchronization costs! assumed that performance does not change for sequential/parallel code (diff. vector length ...)
9	Speedup_honest sp_hon= t₁ for best sequential algorithm, t_p for real parallel algoithm = [t1..]/[...+h_bas +ph_p] (complex form -diff to use) h_p : communication time that depends on p p --> infty sp_hon -->0
10	Scalability There is an optimal number of processors for each problem Fixed problem size with increasing numbers of processors is a poor use of parallel machine
11	Scalability Increasing problem size with increasing numbers of processors leads to better use of parallel machine
12	Scalability Now let problem size m-->infty as p -->infty
13	Scalability Thus scalability is the desired measure of a parallel algorithm/code and not speedup! Scalability is achieved if the quantity [h_p*p/m] is constant or increases very slowly as p increases