1	Codes, Timing, MPE
2	Timing Speedup = (time for 1 processor)/(time for p processors) Shouldn’t include wait time, I/O Self-scheduling – master/slave relationship. Good when slaves do not to communicate among themselves.
3	Timing
4	Timing
5	Timing
6	Scheduling matrix-vector multiply A*b = c each row of A in turn multiplied by b; reassemble product vector master sends b to all processors master sends one row at a time to each slave slaves send the ‘dot products’ back to master, with tag identifier
7	MatVec.f
8	MatVec.f
9	MatVec.f
10	MatVec.f
11	MatVec.f
12	Scalability To compute c, require n multiplies, n-1 adds, for each row, so total work is n(n+[n-1]) = 2n²–n Total time of computation is (2n²–n)T_comp
13	Scalability Ignore communication of b (assume its there aready) Ignore effect of message size on communication To communicate, require n sends (rows of A) + 1 for the answer back, for each row, so total communication is n(n+1) = n²+n Total time of communication is (n²+n)T_comm
14	Scalability Ratio is [(n²+n)T_comm] / [(2n²-n)T_comp] T_comm>> T_comp Would like ratio to decrease with n, but here it asymptotes to ½ Means communication always a bottleneck
15	Scalability MatMat Assume B distributed to all slaves Dot product of columns of B with rows of A To compute C, require n multiplies, n-1 adds, for each element of A, so total work is n²(n+[n-1]) = 2n³–n²
16	Scalability Mat-Mat Again, ignore effect of message size on communication To communicate, require n sends (rows of A) + n for the answer row, for each row, so total communication is n*(2n)
17	Scalability MatMat Ratio is [(2n²)T_comm] / [(2n3-n²)T_comp] T_comm>> T_comp Asymptotes to 1/n Relatively better performance for large n
18	MPE ‘Instrumenting’ the code MPE library Create logfiles, describe states and events upshot output is graphical display; could display raw numbers show, e.g., time in send, recv, compute, Bcast
19	MPE int MPI_Bcast( void buf, int count,MPI_Datatype datatype, int root, MPI_Comm comm ) { int result; MPE_Log_event( S_BCAST_EVENT, Bcast_ncalls, (char )0 ); result = PMPI_Bcast( buf, count, datatype, root, comm ); MPE_Log_event( E_BCAST_EVENT, Bcast_ncalls, (char *)0 ); return result; }
20	MPE MPI_Init">int MPI_Init( int argc, char **argv ) { int procid, returnVal; returnVal = PMPI_Init( argc, argv ); MPE_Initlog(); MPI_Comm_rank( MPI_COMM_WORLD, &procid ); if (procid == 0) { MPE_Describe_state( S_SEND_EVENT, E_SEND_EVENT, "Send", "blue:gray3" ); MPE_Describe_state( S_RECV_EVENT, E_RECV_EVENT, "Recv", "green:light_gray" ); ... } return returnVal; }
21	Jacobi iteration ∆u = f on unit square 0 < x, y < 1 u = g on boundary define grid points x_iy_ji, j = 0,…n+1 set h=1/(n+1)
22	Jacobi iteration u^k+1_ij= (1/4)*(u^k_i-1j+u^k_i+1j+u^k_ij-1+u^k_ij+1- h²f_ij )
23	Jacobi iteration
24	Jacobi iteration
25	Cartesian topology
26	MPI_Cart_create integer dims(2) logical isperiodic(2), reorder dims(1)=3 dims(2)=3 isperiodic(1)=.false. isperiodic(2)=.false. reorder=.true. ndim=2 call MPI_CART_CREATE(MPI_COMM_WORLD, ndim, dims, isperiodic, reorder, comm2d, ierr)
27	MPI_Cart_create find neighbors call MPI_CART_GET(comm2d, 2, dims, isperiodic, coords, ierr) myrank in comm2d call MPI_COMM_RANK(comm2d, myrank, ierr) get coordinates of a process call MPI_CART_COORDS(comm2d, myrank, 2, coords, ierr) find destination and source of shifting call MPI_CART_SHIFT(comm1d, 0, 1, nbrottom, nbrtop, ierr)
28	Cart_create c This routine show how to determine the neighbors in c a 2-d decomposition of the domain. c Assumes that MPI_Cart_create has already been called subroutine fnd2dnbrs( comm2d, $ nbrleft, nbrright, nbrtop, & nbrbottom ) integer comm2d, nbrleft, nbrright, nbrtop, nbrbottom integer ierr call MPI_Cart_shift( comm2d, 0, 1, nbrleft, nbrright, ierr ) call MPI_Cart_shift( comm2d, 1, 1, nbrbottom, nbrtop, ierr ) return end
29	Cart_create subroutine fnd2ddecomp( comm2d, n, sx, ex, sy, ey ) integer comm2d integer n, sx, ex, sy, ey integer dims(2), coords(2), ierr logical periods(2) call MPI_Cart_get( comm2d, 2, dims, periods, coords, ierr ) call MPE_DECOMP1D( n, dims(1), coords(1), sx, ex ) call MPE_DECOMP1D( n, dims(2), coords(2), sy, ey ) return end