Corrupted File Output
From HPCBugBase
HPCBugBase Menu
Submit feedback
Overview
Index
- Defect types (defect patterns)
- Specific defects (individual defects that belong to a defect type)
- Instances (code examples)
- Articles (various info)
- Templates
- Show all categories
- Show all pages
Index by Languages
Contents |
[edit] Fault Description
HPC applications often need to write to a file to store intermediate and/or final results. If the data is written to the same file by multiple processes/threads at once, the file content can get corrupted. The example is as simple as below.
FILE *fp = fopen(filename, "a"); fprintf(fp, ...); fclose(fp);
Notice that the example contains no "parallel code", and it works fine in a sequential code.
[edit] Statistics (Frequency)
[edit] Other Findings and Contexts
A naive solution to resolve this defect in MPI is to serialize file access by, for example, using a barrier.
for (i=0; i<size; i++) {
MPI_Barrier(MPI_COMM_WORLD);
if (i == rank) {
FILE *fp = fopen(filename, "a");
fprintf(fp, ...);
fclose(fp);
}
}
Alternatively, data can be transfered to one process which does all file access.
// Send output to rank 0
if (rank == 0) {
FILE *fp = fopen(filename, "a");
fprintf(fp, ...);
fclose(fp);
}
In either approach, the communication overhead is often too high. It seems the best solution is to write to different files to avoid conflicts.
filename = ... /* output file for this process */ FILE *fp = fopen(filename, "a"); fprintf(fp, ...); fclose(fp);
While it is still possible that file access causes performance issues depending on the organization of the filesystem, this seems to be a standard approach in real-scale HPC applications. The disadvantage is it requires complicated post-processing.
|
Pages referring to this entry: I/O Hotspots Race |
