Monday, September 17, 2007

Parallel Programming: MPI


Parallelization - Parallel Knock

We describe a simple example of message passing in a parallel code. For those you learning about parallel code, we provide the following discussion of elemental message-passing as an introduction to how communication can be performed in parallel codes.
Knock
This example, knock.c, passes a simple message from the even processors to the odd ones. Then, the odd processors send a reply in a message back to the even ones. The "conversation" is displayed on all processors, and the code ends.

This code's main routine performs the appropriate message-passing calls. In a more complex parallel code, these can be used to pass information processors to coordinate or complete useful work. Here, we focus on the basic technique to pass these messages. The C source is shown in Listing 1.
Listing 1 - knock.c, or see knock.f
#include
#include
int main(int argc, char *argv[])
{
/*
this program demonstrates a very simple MPI program illustrating
communication between two nodes. Node 0 sends a message consisting
of 3 integers to node 1. Upon receipt, node 1 replies with another
message consisting of 3 integers.
*/
/* get definition of MPI constants */
#include "mpi.h"
/* define variables
ierror = error indicator
nproc = number of processors participating
idproc = each processor's id number (0 <= idproc < nproc)
len = length of message (in words)
tag = message tag, used to distinguish between messages */
int ierror, nproc, idproc, len=3, tag=1;
/* status = status array returned by MPI_Recv */
MPI_Status status;
/* sendmsg = message being sent */
/* recvmsg = message being received */
/* replymsg = reply message being sent */
int recvmsg[3];
int sendmsg[3] = {1802399587,1798073198,1868786465};
int replymsg[3] = {2003332903,1931506792,1701995839};

printf("Knock program initializing...\n");

/* initialize the MPI execution environment */
ierror = MPI_Init(&argc, &argv);
/* stop if MPI could not be initialized */
if (ierror)
exit(1);
/* determine nproc, number of participating processors */
ierror = MPI_Comm_size(MPI_COMM_WORLD,&nproc);
/* determine idproc, the processor's id */
ierror = MPI_Comm_rank(MPI_COMM_WORLD,&idproc);
/* use only even number of processors */
nproc = 2*(nproc/2);
if (idproc < nproc) {
/* even processor sends and prints message,
then receives and prints reply */
if (idproc%2==0) {
ierror = MPI_Send(&sendmsg,len,MPI_INT,idproc+1,tag,
MPI_COMM_WORLD);
printf("proc %d sent: %.12s\n",idproc,sendmsg);
ierror = MPI_Recv(&recvmsg,len,MPI_INT,idproc+1,tag+1,
MPI_COMM_WORLD,&status);
printf("proc %d received: %.12s\n",idproc,recvmsg);
}
/* odd processor receives and prints message,
then sends reply and prints it */
else {
ierror = MPI_Recv(&recvmsg,len,MPI_INT,idproc-1,tag,
MPI_COMM_WORLD,&status);
printf("proc %d received: %.12s\n",idproc,recvmsg);
ierror = MPI_Send(&replymsg,len,MPI_INT,idproc-1,tag+1,
MPI_COMM_WORLD);
printf("proc %d sent: %.12s\n",idproc,replymsg);
}
}
/* terminate MPI execution environment */
ierror = MPI_Finalize();
if (idproc < nproc) {
printf("hit carriage return to continue\n");
getchar();
}
return 0;
}
Discussing the parallel aspects of this code:

mpi.h - the header file for the MPI library, required to access information about the parallel system and perform communication
idproc, nproc - nproc describes how many processors are currently running this job and idproc identifies the designation, labeled using integers from 0 to nproc - 1, of this processor. This information is sufficient to identify exactly which part of the problem this instance of the executable should work on.
MPI_Init - performs the actual initialization of MPI, setting up the connections between processors for any subsequent message passing. It returns an error code; zero means no error.
MPI_COMM_WORLD - MPI defines communicator worlds or communicators that define a set of processors that can communicate with each other. At initialization, one communicator, MPI_COMM_WORLD, covers all the processors in the system. Other MPI calls can define arbitrary subsets of MPI_COMM_WORLD, making it possible to confine a code to a particular processor subset just by passing it the appropriate communicator. In simple cases such as this, using MPI_COMM_WORLD is sufficient.
MPI_Comm_size - accesses the processor count of the parallel system
MPI_Comm_rank - accesses the identification number of this particular processor
if (idproc%2==0) - Up until this line, the execution of main by each processor has been the same. However, idproc distinguishes each processor. We can take advantage of this difference by guiding the processor to a different execution as a function of idproc. This Boolean statement causes those processors whose idproc is even (0, 2, 4, ...) to execute the one part of the if-block, while the odd processors (1, 3, 5, ...) execute the second part of the block. At this point, no central authority is consulted for assignments; each processor recognizes what part of the problem it should perform using only its idproc.
MPI_Recv and MPI_Send - These elemental MPI calls perform communication between processors. In this case, the even processor, executing the first block of the if statement, begins by using MPI_Send to send the contents of the sendmsg array, whose size is given as 3 long integers, specified by len and MPI_INT, to the next processor, specified by idproc+1. tag merely identifies the message for debugging purposes, and must match the tag on the corresponding receive. MPI_Send returns when the MPI library no longer requires that block of memory, allowing your code to execute, but the message may or may not have arrived at its destination yet.
Meanwhile, the odd processor uses MPI_Recv to receive the data from the even processor at idproc-1. This processor specifies that the message be written to recvmsg. The call returns information about the message in the status variable. MPI_Recv returns when it is finished with the recvmsg array.
Note that the even and odd processors might not be in lock step with one another. Such tight control is not needed to complete this task. One processor could reach one MPI call sooner than the other, but the MPI library performs the necessary coordination so that the message is properly handled while allowing each processor to continue with work as soon as possible. The processors affect one another only when the communicate.
After printing the first message, each processor switches roles. Now the odd processors call MPI_Send to send back a reply in replymsg, and the even processors call MPI_Recv to receive the odd processors' reply in their recvmsg. For debugging, the tag used is incremented on all processors. Note that this is a distributed-memory model, so the instance of recvmsg on one processor resides in the memory on one machine while the other instance of recvmsg resides on a completely different piece of hardware. Here we are using the even processors' recvmsg for the first time while the odd processors' recvmsg holds data from the first message. Finally, each processor prints this second message.
MPI_Finalize - performs the properly cleanup and close the connections between codes running on other processors and release control.
When this code, with the initialization values in sendmsg and replymsg, is run in parallel on two processors, the code produces this output:
Processor 0 output
Processor 1 output
Knock program initializing...
proc 0 sent: knock,knock!
proc 0 received: who's there?
hit carriage return to continue
Knock program initializing...
proc 1 received: knock,knock!
proc 1 sent: who's there?
hit carriage return to continue
It performs the opening lines of the classic "knock, knock" joke. This example is an appropriate introduction because that joke requires the first message to be sent one way, then the next message the other way. After each message pass, all processors print the data they have showing how the conversation is unfolding.

Conclusion
The purpose of this discussion was to highlight the basic techniques needed to perform elemental message passing between processors. It also shows how one identification number, idproc, can be used to cause different behavior by distinguishing between processors. The overhead of initialization (MPI_Init et al) and shut down (MPI_Finalize) seems large compared to the actual message passing (MPI_Send and MPI_Recv) only because the nature of the message passing is so simple. In a real-world parallel code, the message-passing calls can be much more complex and interwoven with execution code. We chose a simple example so attention could be drawn to the communications calls. The reader is welcome to explore how to integrate these calls into their own code

0 comments;Click here for request info on this topic:

Post a Comment