Sunday, August 2, 2015

It's Worse Than You Think

On why you cannot reason about data races:

Many people reason about lockless/CAS-less code and try to consider how a race condition can be made to be harmless.

If one is lucky, races can lead to duplicated work in the worst case.

Ex:
x.data = new list(1, 2, 3), x.length = 3;
x.data = new list ();
x.length = 0;


The problem arises when there are multiple related references that must be changed, or values which rely upon one another. The interleaving of threads can cause random absurdity. "I can still handle a race in that example," you say: "I'll figure out at run-time if the data is inconsistent and wait until it is. If the list is empty and the length field is larger than 0, wait."

Modern superscalar CPUs will dispatch the instructions to execution units in-order but execute them out-of-order. Where a lack of dependency between two instructions can be guessed by the CPU, it doesn't promise any ordering between them. Compilers can behave similarly when aggressively optimizing; this is why modern C is performant. The problem is that across threads, the order of reads and writes can be arbitrarily changed as long as they are correct with respect to others in the same executing thread.

In your racy list example, the length can be set to 0 before the data is set to the new list. That's right, the order that you see on the screen between synchronization primitives means very little.

To really understand this problem, I cannot recommend the following paper from the POPL conference heavily enough. It's a pretty good description and proof of what we know the consistency guarantees of x86 concurrency to be.

It's worse than you think.

The quote from this paper that best bursts the Dunning–Kruger effect is:




Paper: http://www.cl.cam.ac.uk/~pes20/weakmemory/popl09.pdf

Memory consistency in multithreaded applications that is not bought by synchronization primitives and memory fences cannot be had for free. Furthermore, the worst failures had by eliding these primitives will not happen at compile-time or test-time, they will be black swan events dependent on everything from network speed to the ambient solar radiation. 

No comments:

Post a Comment