8

Preconditions (generic description):

1. static class field

static List<String> ids = new ArrayList<>();

2. CompletableFuture#runAsync(Runnable runnable,Executor executor)

called within static void main(String args[]) method

3. elements added to someCollection inside of runAsync call from step2

Code snippet (specific description):

private static List<String> ids = new ArrayList<>();

public static void main(String[] args) throws ExecutionException, InterruptedException {
    //...
    final List<String> lines = Files.lines(path).collect(Collectors.toList());
    for (List<String> lines : CollectionUtils.split(1024, lines)) {
         CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
             List<User> users = buildUsers();
             populate(users);
         }, executorService);

        futures.add(future);
    }

    private static void populate(List<User> users){
       //...
       ids.add(User.getId);
       //...
    }
}

Problem description:

As I understand from concurrency point of view, static variable could NOT be shared between threads, so data can be lost some way.

Should it be changed to volatile or it would be reasonable to use ConcurrentSkipListSet<String> ?

Michael
  • 41,989
  • 11
  • 82
  • 128
sergionni
  • 13,290
  • 42
  • 132
  • 189
  • 5
    Stop using mutable static variables. Mutable statics are evil!!! – lance-java Jan 03 '18 at 10:36
  • 1
    In addition to what @lance-java says, your problem is actually that `ArrayList` is not thread-safe, and you don't have any synchronization to access it. So you are corrupting its internal data structure. – Didier L Jan 04 '18 at 11:07
  • @DidierL thank you for hint, I've started using `ConcurrentSkipListSet` is that looks ok? – sergionni Jan 04 '18 at 14:10
  • 2
    I don't know your requirements but you should probably look at [Is there a concurrent List in Java's JDK?](https://stackoverflow.com/questions/6916385/is-there-a-concurrent-list-in-javas-jdk). A `ConcurrentLinkedQueue` might be a better fit. – Didier L Jan 04 '18 at 14:17
  • @DidierL thanks, I will look into it – sergionni Jan 04 '18 at 15:30

3 Answers3

3

Based on the code snippet:

  • volatile is not required here because it works on reference level, while the tasks don't update the reference of the collection object, they mutate its state. Would the reference be updated, either volatile or AtomicReference might have been used.

  • Static object can be shared between threads, but the object must be thread-safe. A concurrent collection will do the job for light to medium load.

But the modern way to do this would involve streams instead of using a shared collection:

List<CompletableFuture<List<String>>> futures = lines.stream()
        .map(line -> CompletableFuture.supplyAsync(() -> buildUsers().stream()
                                                                     .map(User::getId)
                                                                     .collect(Collectors.toList()),
             executorService))
        .collect(Collectors.toList());

ids.addAll(futures.stream()
                  .map(CompletableFuture::join)
                  .flatMap(List::stream)
                  .collect(Collectors.toList()));
jihor
  • 2,478
  • 14
  • 28
3

In your particular case there are ways to guarantee thread safety for ids:

  1. Use thread-safe collection (for example, ConcurrentSkipListSet, CopyOnWriteArrayList, Collections.synchronizedList(new ArrayList(), Collections.newSetFromMap(new ConcurrentHashMap());
  2. Use synchronization as shown below.

Examples of synchronized:

private static synchronized void populate(List<User> users){
  //...
  ids.add(User.getId);
  //...
}

private static void populate(List<User> users){
  //...
  synchronized (ids) {
      ids.add(User.getId);
  }
  //...
}

I assume that it would be the fastest to use Collections.newSetFromMap(new ConcurrentHashMap(), if you expect a lot of user ids. Otherwise, you would be familiar with ConcurrentSkipListSet.

volatile is a bad option here. Volatile guarantees visibility, but not atomicity. The typical examples of volatile usage are

 volatile a = 1

 void threadOne() {
      if (a == 1) {
           // do something
      }
 }

 void threadTwo() {
      // do something 
      a = 2
 }

In that case, you do only write/read operations once. As "a" is volatile, then it is guaranteed that each thread "see" (read) full exactly 1 or 2. Another (bad example):

 void threadOne() {
      if (a == 1) {
           // do something
           a++;
      }
 }

 void threadTwo() {
      if (a == 1) {
           // do something
           a = 2
      } else if (a == 2) {
           a++
      }
 }

Here we do increment operation (read and write) and there are could be different results of a, because we don't have atomicity. That's why there are AtomicInteger, AtomicLong, etc. In your case, all threads would see the write value ids, but they would write different values and if you see inside "add" method of ArrayList, you will see something like:

elementData[size++] = e;

So nobody guarantees atomicity of size value, you could write different id in one array cell.

egorlitvinenko
  • 2,736
  • 2
  • 16
  • 36
2

In terms of thread safety it doesn't matter whether the variable static or not. What matters are

  1. Visibility of shared state between threads.
  2. Safety (preserving class invariants) when class object is used by multiple threads through class methods.

Your code sample is fine from visibility perspective because ids is static and will be initialized during class creation time. However it's better to mark it as final or volatile depending on whether ids reference can be changed. But safety is violated because ArrayList doesn't preserve its invariants in multithreaded environment by design. So you should use a collection which is designed for accessing by multiple threads. This topic should help with the choice.

Mikita Harbacheuski
  • 2,193
  • 8
  • 16