The fact that Haskell's parallel programming capabilities work so well is a minor miracle. I didn't quite appreciate it until understanding the spineless tagless G-Machine. Haskell's model of computation, the use of repeated pattern matching to expand and reduce a runtime stack seems particularly poorly suited for parallelism.
Purity does decrease the penalty of duplicated work, but concurrent evaluation of thunks means that the risk of duplicating work is pretty high. The opposite extreme is even more unacceptable. Use of CAS, or locking even, would add an overhead to every evaluation that in sum is just too much.
The mentioned paper (http://research.microsoft.com/pubs/67424/2005-haskell.pdf) details the cleverness behind this feat of engineering.
Performance demands we simply avoiding locking entirely in the case that evaluation does not use unsafePerformIO. This leaves us with the problem of mitigating the possible performance losses due to duplicate evaluation.
Blackholing is used to signify that a thunk has been entered, and that it should not be re-entered by another function. Because of the need for CAS primitives, a clever way to reduce the frequency of it is used. Every once in a while a function will scan the stack of values that it is currently evaluating, and will CAS in the black hole. When another function tries to enter this black hole, it simply blocks on A. If the running thread finds out that it's inside of a function already blackholed, it can truncate the stack and discard items newer than that blackhole. This ensures that probabilistically it is the longer-running and thus more expensive evaluations that will not be duplicated.
This can be further optimized by writing in a "grey hole" on entry to the function in a lockless way. This doesn't prevent all work-duplicating races but it does prevent ones that happen after a memory fence. This shortening of the race's time window decreases the amount of duplicated work. Unconditional greyholing does have a performance cost though, requiring a write on entering even the cheapest evaluations.