Data Races with Example

There have been many infamous cases of software failures because of simple bugs. Data races is one of such bugs, that may not only lead to monetary losses but may also have life threatening repercussions. Therac-25, a radiation therapy machine that was used to treat patients with cancer in the past had a concurrency bug which was triggered on some particular conditions. When the bug was triggered the machine delivered lethal doses radiations to its patients, which led to deaths of over 5 patients.
Another famous example of software failure caused due to a data race is the Paypal bug that caused a deposit of amount $92 quadrillion into a Paypal user's account, and that is a lot of money!

So, what is a data race?
In a multi-threaded program, when two or more threads access the same shared variable (global or static) concurrently or at the same time where atleast one of the threads performs a write operation leads to a data race. As a data race may wrongly update the value of the variable it can later lead to an error. But not all the races are harmful, some races are intentionally introduced  by the developers to improve performance of the system.

To sum up, data race occurs when all three of the listed points are satisfied:

1. Two or more threads must be accessing a shared data.
2. At least one of them performs a write operation.
3. The accesses or operations are not secured inside locks.

As displayed in the above example, Threads T1 and T2 are both trying to append the shared variable 'i' by 6. Before T1's update is reflected in the memory T2 picks up the old value of 'i' and appends it. Now, irrespective of which thread updates the memory last, 'i' will contain a wrong value.

The picture below represents the correct execution of the above example, where thread T1 updates the value and writes in the memory first followed by T2 reading the updated value. To guarantee race free executions. developers employ different techniques. Using locks is one of these. In locking, if some thread say Tx holds some lock over a critical section where a shared variable is getting modified, only Tx can execute the critical section and no other thread can start the execution before Tx has released the lock. Some other famous mechanisms to avoid data races are  atomic variables, semaphores and mutexes.

No locks are required in the case where concurrent accesses or updates do not cause any anomaly. Indeed these cases can be exploit to improve the program performance. For example, if in above example the variable 'i' was getting assigned a same value by both the threads say 'y', it would not cause any error no matter what order the threads are executed.

When creating a data race detection system detection system, it becomes important to alert the user for only harmful races and minimize the false positives (Notifying the presence of a data race when it is not present). For this, designing a scheduler is recommended which can replay all the interleavings to determine if there exists a particular interleaving for a data race that can cause an a error in the future.