Potential Deadlock

From HPCBugBase

Jump to: navigation, search

HPCBugBase Menu

Submit feedback


Overview


Index


Index by Languages

Contents

[edit] Fault Description

Potential deadlock is a Deadlock that may or may not occur, depending on the runtime condition and the implementation of compiler/library.

If the failure doesn't always occur, developers or the users may not notice the existence of the defect until they run the code in different hardware/software environment, or with different parameters.

[edit] Potential Deadlock in MPI

MPI_Send() and MPI_Recv() are the most basic point-to-point communication API functions. Most novices learn how to use them first (See Six Most Basic MPI Functions). Nevertheless, they are found to be the source of potential deadlocks.

In MPI, there are 4 different communication mode for blocking communications (see MPI spec for the detailed definition).

  • standard (MPI_Send): a send operation may either return immediately when the outgoing message is buffered in the MPI buffers, or block until a matching receive has been posted and the message is sent away.
  • buffered (MPI_Bsend): a send operation is completed when the MPI buffers the outgoing message. An error is returned when there is insufficient buffer space
  • synchronous (MPI_Ssend): a send operation is complete only when the matching receive operation has started to receive the message.
  • ready (MPI_Rsend): a send operation can be started only after the matching receive operation has been posted.

As shown above, in the standard MPI_Send() call, the MPI implementation decides whether to block the call, possibly based on the message size, buffer size and all other performance decisions. In the spec, it is recommended to choose buffering over blocking the sender, whenever possible, for standard sends. As a result, MPI_Send() returns immediately without blocking in many cases. However, such behavior is not a guaranteed one. Therefore, the programs that are written relying on the assumption that MPI_Send() never blocks can cause a problem when they are run in a different condition.

For example, the following pseudo-code runs correctly if MPI_Send returns immediately, but it causes a Deadlock if MPI_Send blocks. This type of deadlock is called a potential deadlock. (An equivalent example is described in the MPI spec.)

if (rank == 0) {
   MPI_Send( /* send_to_rank_1 */);
   MPI_Recv( /* receive_from_rank_1 */);
}
else if (rank == 1) {
   MPI_Send( /* send_to_rank_0 */ );
   MPI_Recv( /* receive_from_rank_0 */);
}

[edit] Statistics (Frequency)

The table below shows the results of the defect analysis conducted by University of Maryland. The data was collected from 23 students in 3 HPC courses who solved the Game of Life problem with MPI and C/C++ as a course assignment. The results shows that 6 students out of 23 (26%) left a potential deadlock in their final submissions.

Analysis result of potential deadlock in MPI
Analysis result Occurence
Committed a potential deadlock detect 6
Avoided potential deadlock by MPI_Sendrecv 2
Avoided potential deadlock by non-blocking MPI_Isend and/or MPI_Irecv Correct 6
Missing MPI_Wait 1
Used MPI_Send/Recv without potential deadlock Correct 3
Scheduling problem 4
Not nearest neighbor 1
Total 23

[edit] Other Findings and Contexts

MPI-CHECK (http://andrew.ait.iastate.edu/HPC/MPI-CHECK.htm) developed at Iowa State University is known to have the functionality of runtime checking of potential deadlocks of MPI programs written in Fortran 90 and Fortran 77.

Pages referring to this entry: Bottleneck in Message Scheduling Deadlock Insights on the Potential Deadlock Six Most Basic MPI Functions 

Personal tools