“I don’t have much threading experience, I work on web applications.”
Ever heard this? Me too. In fact, I’ve heard it come from my mouth a time or two. My exposure to threading is limited as I have traditionally operated within the web applications realm, however it is important that even web application engineers understand how threads work and the importance of writing thread safe code.
The example
Let’s say we have a contrived web application that stores a set of data in the session. For our purposes we will assume that the sample application is storing email addresses in the session (which it is not – they are just dummy strings). The users of the application can perform many different functions on the stored data but two are prevalent: removing addresses from the collection and sending mail to each address.
When the user selects to remove an item from the collection the storage object is retrieved from the session and the item is found, then removed. When the user selects to send mail to each item in the collection the storage object is retrieved from the session and iterated over. As the application iterates through the collection it hits the database to retrieve the appropriate message and to update the user record with the number of emails they have been sent (the sample application just does a Thread.sleep(300) to simulate this).
The problem
Because of database lookups, socket creation, and more, sending the emails to the entire list can take a while – roughly 300 milliseconds per item – so many users right click and open this in a new tab while they continue the rest of their work. To simulate this the sample application has a link on the main page that pops open two windows, simulating the user removing an address from the list and sending emails to all in the list at the same time.
When the non thread safe version of the code runs the remove functionality deletes an item from the collection while the send functionality is still iterating through the items. This will, in most cases, cause an IndexOutOfBoundsException to be thrown. To fix this we simply must change the send code so that it either makes a defensive copy first or does something else to accommodate the possibility of the underlying data changing. In our case we will implement the iterations with an Iterator.
The code
Comments, imports, and more in all code shown in this post has been removed for brevity – download the full example for the full thing.
The example is comprised of two main objects: GoodObject and BadObject. GoodObject takes into account the fact that the underlying data set may change while BadObject does not. Both extend BaseObject in order to share some common functionality:
public abstract class BaseObject {
protected List data = new CopyOnWriteArrayList();
public void add(String toAdd) {
data.add(toAdd);
}
public boolean remove(String toRemove) {
boolean toReturn = data.remove(toRemove); // For logging...
return toReturn;
}
public abstract void doSomething();
public String toString() {
StringBuilder sb = new StringBuilder();
for (int i = 0;i < data.size();i++) {
sb.append(data.get(i) + " ");
}
return sb.toString();
}
protected void simulateSomething(Object obj) {
try {
Thread.sleep(300);
} catch (Exception e) { /* Don't Care... */ }
}
}
In the BaseObject the add(), remove, and simulateSomething() methods are all shared and the doSomething() method will perform an action in a good and bad way. Following our example above, the doSomething() method would do some database lookups and send an email in real life.
The BadObject implementation of the doSomething() method fails to recognize the fact that the underlying collection can be updated in a non-synchronized way:
public void doSomething() {
int len = data.size();
simulateSomething(null);
for (int i = 0;i < len;i++) {
simulateSomething(data.get(i));
}
}
The failure point here comes if the remove() call is made after the call to data.size() is. Since the size is gathered then the item is removed, the for loop will inevitably try to access an elements outside the collection bounds, resulting in an exception. The same case would ring true if the collection needed to be iterated backwards for some reason.
The GoodObject implementation uses Java’s Iterator class to handle this problem:
public void doSomething() {
int len = data.size();
simulateSomething(null);
Iterator<String> items = data.iterator();
while (items.hasNext()) {
simulateSomething(items.next());
}
}
Since the iteration of the collection is no longer tied to an index, we won’t be able to hit a non-existent element and have corrected the problem.
The lesson
While this is a contrived example, it is not too far fetched. I have seen code blow up countless times because it was unable to handle something as simple as multiple windows being open. Other times I have seen weird behavior, such as each window mirroring the other because the application navigation was session bound. Likewise, I have seen code in many different organizations that did not take these things into account.
If you are writing a web application think to yourself: what is going to happen if the user opens something in a new window, thus creating two access points to the same session?