Performance and Network Latency

From HPCBugBase

Jump to: navigation, search

In a message passing model, the communication overhead is often a performance bottleneck. The time it takes a message is sent from one process to another is (latency + size / bandwidth). When the network is slow, bandwidth is a major limiting factor. However, recent advance in network technology has made the connections 10x or 100x faster, which steadily improves the situation. On the other hand, latency is becoming a serious issue. For example, an MPI program on a grid infrastructure can be executed on geographically distant locations. Since the latency cannot be improved beyond the light speed, it inevitably becomes larger as the distance between the MPI nodes increase. Of course, there is also an overhead due to a network protocol.

According to one practitioner, 10 msec (20 msec round trip) is the maximum latency that allows a program execution without a significant performance problem. This is an empirical derived limit which can be compensated with programming "tricks", such as overlapping communication with computation. Further research is necessary to collect more evidence.

Personal tools