Corrupted File Output

From HPCBugBase

Jump to: navigation, search

HPCBugBase Menu

Submit feedback


Overview


Index


Index by Languages

Contents

[edit] Fault Description

HPC applications often need to write to a file to store intermediate and/or final results. If the data is written to the same file by multiple processes/threads at once, the file content can get corrupted. The example is as simple as below.

FILE *fp = fopen(filename, "a");
fprintf(fp, ...);
fclose(fp);

Notice that the example contains no "parallel code", and it works fine in a sequential code.

[edit] Statistics (Frequency)

[edit] Other Findings and Contexts

A naive solution to resolve this defect in MPI is to serialize file access by, for example, using a barrier.

for (i=0; i<size; i++) {
  MPI_Barrier(MPI_COMM_WORLD);
  if (i == rank) {
    FILE *fp = fopen(filename, "a");
    fprintf(fp, ...);
    fclose(fp);
  }
}

Alternatively, data can be transfered to one process which does all file access.

// Send output to rank 0
if (rank == 0) {
  FILE *fp = fopen(filename, "a");
  fprintf(fp, ...);
  fclose(fp);
}

In either approach, the communication overhead is often too high. It seems the best solution is to write to different files to avoid conflicts.

filename = ... /* output file for this process */
FILE *fp = fopen(filename, "a");
fprintf(fp, ...);
fclose(fp);

While it is still possible that file access causes performance issues depending on the organization of the filesystem, this seems to be a standard approach in real-scale HPC applications. The disadvantage is it requires complicated post-processing.

Pages referring to this entry: I/O Hotspots Race 

Personal tools