4

I have following use-case:

  • Need a single background thread which maintains a set of accountIDs in memory and it fetches the latest accountIds every 1 second
  • Other multiple parallel running process will search if particular accountID is present in above Set or not

To achieve above use-case, I have following code. runTask() is a method which is responsible for fetching new Set every 1 second. doesAccountExist method is called by other parallel threads to check if accountId exists in the Set or not.

class AccountIDFetcher {
 private Set<String> accountIds;
 private ScheduledExecutorService scheduledExecutorService;

 public AccountIDFetcher() {
   this.accountIds = new HashSet<String>();
   scheduledThreadPoolExecutor = new ScheduledThreadPoolExecutor(1);
   scheduledExecutorService.scheduleWithFixedDelay(this::runTask, 0, 1, TimeUnit.SECONDS);
 }

// Following method runs every 1 second
 private void runTask() {
   accountIds = getAccountIds()
 }

 // other parallel thread calls below method
 public boolean doesAccountExist(String accountId) {
   return accountIds.contains(instanceId);
 }

 private Set<String> getAccountIds() {
   Set<String> accounts = new HashSet<String>();
   // calls Database and put list of accountIds into above set
   return accounts;
 }

}

I have following question

  1. In runTask method, I just change the reference of accountIds variable to a new object. So if, Thread-2 is in the middle of searching accountId in doesAccountExist() method and at the same time if Thread-1 executes runTask() & changes the reference of accountIds variable to a new object then old object gets orphaned. is it possible that old object can be garbage collected before Thread-2 finish searching in it?
mihir S
  • 617
  • 3
  • 8
  • 23

3 Answers3

3

tl;dr

You asked:

is it possible that old object can be garbage collected before Thread-2 finish searching in it?

No, the old Set object does not become garbage while some thread is still using it.

An object only becomes a candidate for garbage-collection after each and every reference to said object (a) goes out of scope, (b) is set to null, or (c) is a weak reference.

No, an object in use within a method will not be garbage-collected

In Java, an object reference assignment is atomic, as discussed in another Question. When this.accountIds is directed to point to a new Set object, that happens in one logical operation. That means that any other code in any other thread accessing the accountIds member field will always successfully access either the old Set object or the new Set object, always one or the other.

If during that re-assignment another thread accessed the old Set object, that other thread's code is working with a copy of the object reference. You can think of your doesAccountExist method:

 public boolean doesAccountExist(String accountId) {
   return accountIds.contains(accountId);
 }

…as having a local variable with a copy of the object reference, as if written like this:

 public boolean doesAccountExist(String accountId) {
   Set<String> set = this.accountIds ;
   return set.contains(accountId);
 }

While one thread is replacing the reference to a new Set on the member field accountIds, the doesAccountExist method already has a copy of the reference to the old Set. At that moment, while one thread is changing the member field reference, and another thread has a local reference, the garbage collector sees both the new and old Set objects as having (at least) one reference each. So neither is a candidate for being garbage-collected.

Actually, more technically correct would be explaining that at the point in your line return accountIds.contains(accountId); where execution reaches the accountIds portion, the current (old) Set will be accessed. A moment later the contains method begins its work, during which re-assigning a new Set to that member field has no effect on this method's work-in-progress already using the old Set.

This means that even after the new Set has been assigned in one thread, the other thread may still be continuing its work of searching the old Set. This may or may not be a problem depending on the business context of your app. But your Question did not address this stale-data transactional aspect.

So regarding your question:

is it possible that old object can be garbage collected before Thread-2 finish searching in it?

No, the old Set object does not become garbage while some thread is still using it.

Other issues

Your code does have other issues.

Visibility

You declared your member field as private Set<String> accountIds;. If you access that member field across threads on a host machine with multiple cores, then you have a visibility problem. The caches on each core may not be refreshed immediately when you assign a different object to that member field. As currently written, it is entirely possible that one thread accessing this.accountIds will gain access to the old Set object even after that variable was assigned the new Set object.

If you do not already know about the issues I mention, study up on concurrency. There is more involved than I can cover here. Learn about the Java Memory Model. And I strongly recommend reading and re-reading the classic book, Java Concurrency in Practice by Brian Goetz, et al.

volatile

One solution is to mark the member field as volatile. So, this:

private Set<String> accountIds;

…becomes this:

volatile private Set<String> accountIds;

Marking as volatile avoids a stale cache on a CPU core pointing to the old object reference rather than the new object reference.

AtomicReference

Another solution is using an object of AtomicReference class as the member field. I would mark it final so that one and only one such object is ever assigned to that member field, so the field is a constant rather than a variable. Then assign each new Set object as the payload contained within that AtomicReference object. Code wanting the current Set object calls a getter method on that AtomicReference object. This call is guaranteed to be thread-safe, eliminating the need for volatile.

Concurrent access to existing Set

Another possible problem with your code might be concurrent access to an existing Set. If you have more than one thread accessing the existing Set, then you must protect that resource.

One way to protect access to that Set is to use a thread-safe implementation of Set such as ConcurrentSkipListSet.

From what you have showed in the Question, the only access to the existing Set that I noticed is calling contains. If you are never modifying the existing Set, then merely calling contains across multiple threads may be safe — I just don't know, you'd have to research it.

If you intend to never modify an existing Set, then you can enforce that by using an unmodifiable set. One way to produce an unmodifiable set is to construct and populate a regular set. Then feed that regular set to the method Set.copyOf. So your getAccountIds method would look like this:

 private Set<String> getAccountIds() {
   Set<String> accounts = new HashSet<String>();
   // calls Database and put list of accountIds into above set
   return Set.copyOf( accounts );
 }

Return a copy rather a reference

There are two easy ways to avoid dealing with concurrency:

  • Make the object immutable
  • Provide a copy of the object

As for the first way, immutability, the Java Collections Framework is generally very good but unfortunately lacks explicit mutability & immutability in its type system. The Set.of methods and Collections.unmodifiableSet both provide a Set that cannot be modified. But the type itself does not proclaim that fact. So we cannot ask the compiler to enforce a rule such as our AtomicReference only storing an immutable set. As an alternative, consider using a third-party collections with immutability as part of its type. Perhaps Eclipse Collections or Google Guava.

As for the second way, we can make a copy of our Set of account IDs whenever needing access. So we need a getCurrentAccountIds method that goes into the AtomicReference, retrieves the Set stored there, and called Set.copyOf to produce a new set of the same contained objects. This copy operation is not documented as being thread-safe. So we should mark the method synchronized to allow only one copy operation at a time. Bonus: We can mark this method public to give any calling programmer access to the Set of account IDs for their own perusal.

    synchronized public Set < UUID > getCurrentAccountIds ( )
    {
        return Set.copyOf( this.accountIdsRef.get() ); // Safest approach is to return a copy rather than original set.
    }

Our convenience method doesAccountExist should call that same getCurrentAccountIds to obtain a copy of the set before doing its "contains" logic. This way we do not care whether or not the "contains" work is thread-safe.

Caveat: I am not satisfied with using Set.copyOf to as means to avoid any possible thread-safety issues. That method notes that if the passed collection being copied is already an unmodifiable set, then a copy may not be made. In real work I would use a Set implementation that guarantees thread-safety whether found bundled with Java or by adding a third-party library.

Do not use object within constructor

I do not like seeing the scheduled executor service appearing within your constructor. I see two issues there: (a) app lifecycle and (b) using an object within a constructor.

Creating the executor service, scheduling tasks on that service, and shutting down that service are all related to the lifecycle of the app. An object generally should not be aware of its lifecycle within the greater app. This account IDs provider object should know only how to do its job (provide IDs) but should not be responsible for putting itself to work. So your code is mixing responsibilities, which is generally a poor practice.

Another problem is that the executor service is being scheduled to immediately start using the very object that we are still constructing. Generally, the best practice is to not use an object while still under construction. You may get away with such use, but doing so is risky and is prone to leading to bugs. A constructor should be short and sweet, used just to validate inputs, establish required resources, and ensure the integrity of the object being birthed.

I did not pull the service out of your constructor only because I did not want to this Answer to go too far out into the weeds. However, I did make two adjustments. (a) I changed the initial delay on the scheduleWithFixedDelay call from zero to one second. This is a hack to give the constructor time to finish birthing the object before its first use. (b) I added the tearDown method to properly shutdown the executor service so its backing thread-pool does not continue running indefinitely in zombie fashion.

Tips

I suggest renaming your getAccountIds() method. The get wording in Java is usually associated with the JavaBeans convention of accessing an existing property. In your case you are generating an entirely new replacement set of values. So I would change that name to something like fetchFreshAccountIds.

Consider wrapping your scheduled task with a try-catch. Any Exception or Error bubbling up to reach the ScheduledExecutorService brings a silent halt to any further scheduling. See ScheduledExecutorService Exception handling.

Example code.

Here is a complete example of my take on your code.

Caveat: Use at your own risk. I am not a concurrency expert. This is meant as food-for-thought, not production use.

I used UUID as the data type of the account IDs to be more realistic and clear.

I changed some of your class & method names for clarity.

Notice which methods are private and which are public.

package work.basil.example;

import java.time.Duration;
import java.time.Instant;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
import java.util.UUID;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;

public class AccountIdsProvider
{
    // Member fields
    private AtomicReference < Set < UUID > > accountIdsRef;
    private ScheduledExecutorService scheduledExecutorService;

    // Constructor
    public AccountIdsProvider ( )
    {
        this.accountIdsRef = new AtomicReference <>();
        this.accountIdsRef.set( Set.of() ); // Initialize to empty set.
        this.scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
        scheduledExecutorService.scheduleWithFixedDelay( this :: replaceAccountIds , 1 , 1 , TimeUnit.SECONDS );  // I strongly suggest you move the executor service and the scheduling work to be outside this class, to be a different class’ responsibility.
    }

    // Performs database query to find currently relevant account IDs.
    private void replaceAccountIds ( )
    {
        // Beware: Any uncaught Exception or Error bubbling up to the scheduled executor services halts the scheduler immediately and silently.
        try
        {
            System.out.println( "Running replaceAccountIds. " + Instant.now() );
            Set < UUID > freshAccountIds = this.fetchFreshAccountIds();
            this.accountIdsRef.set( freshAccountIds );
            System.out.println( "freshAccountIds = " + freshAccountIds + " at " + Instant.now() );
        }
        catch ( Throwable t )
        {
            t.printStackTrace();
        }
    }

    // Task to be run by scheduled executor service.
    private Set < UUID > fetchFreshAccountIds ( )
    {
        int limit = ThreadLocalRandom.current().nextInt( 0 , 4 );
        HashSet < UUID > uuids = new HashSet <>();
        for ( int i = 1 ; i <= limit ; i++ )
        {
            uuids.add( UUID.randomUUID() );
        }
        return Set.copyOf( uuids ); // Return unmodifiable set.
    }

    // Calling programmers can get a copy of the set of account IDs for their own perusal.
    // Pass a copy rather than a reference for thread-safety.
    // Synchronized in case copying the set is not thread-safe.
    synchronized public Set < UUID > getCurrentAccountIds ( )
    {
        return Set.copyOf( this.accountIdsRef.get() ); // Safest approach is to return a copy rather than original set.
    }

    // Convenience method for calling programmers.
    public boolean doesAccountExist ( UUID accountId )
    {
        return this.getCurrentAccountIds().contains( accountId );
    }

    // Destructor
    public void tearDown ( )
    {
        // IMPORTANT: Always shut down your executor service. Otherwise the backing pool of threads may run indefinitely, like a zombie ‍.
        if ( Objects.nonNull( this.scheduledExecutorService ) )
        {
            System.out.println( "INFO - Shutting down the scheduled executor service. " + Instant.now() );
            this.scheduledExecutorService.shutdown();  // I strongly suggest you move the executor service and the scheduling work to be outside this class, to be a different class’ responsibility.
        }
    }

    public static void main ( String[] args )
    {
        System.out.println( "INFO - Starting app. " + Instant.now() );
        AccountIdsProvider app = new AccountIdsProvider();
        try { Thread.sleep( Duration.ofSeconds( 10 ).toMillis() ); } catch ( InterruptedException e ) { e.printStackTrace(); }
        app.tearDown();
        System.out.println( "INFO - Ending app. " + Instant.now() );
    }
}
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • thanks a lot @basil-bourque. The other thread which still may be searching in old Set is not a problem. In my app, it's fine to return stale data to that thread. But it was not fine for that thread to throw unnecessary exception if in the middle of search operation the Old Set was garbage collected. That's what i wanted to confirm – mihir S Feb 03 '21 at 02:06
  • it's either really late for me or this is not entirely correct. GC wise, sure, a strongly referenced instance is not going to be collected, agreed. But you can think that method can be re-written like this: `public boolean doesAccountExist(String accountId) { int size = accountIds.size(); for(int i=0;i – Eugene Feb 03 '21 at 04:11
  • @Eugene Yes, looping on a `Set` being modified by another thread is problematic. But that is not the problem asked in the Question. The Question was specifically about an object being garbage-collected when in use in one thread while being replaced as the target of an object reference in a member field. I do not even see any code in the Question related to modifying the `Set` after instantiation. The code seen in the Question swaps out one `Set` object for another, and accesses the member field only a single time when calling `contains`. So what part of the Answer is "not entirely correct"? – Basil Bourque Feb 03 '21 at 07:04
  • your answer suggests that "this is fine" to replace the `Set` reference like this. You imply that the other threads will just iterate their own copy, without any interference with the already changed copy. I have shown a very simple example (that you seem to agree with), where this will break. The example was supposed to prove that you have no idea what `contains` _might_ do internally and how it might iterate the `Set`. Adding visibility constraints and even copying is of no help - you are still updating a _shared_ instance. – Eugene Feb 03 '21 at 14:44
  • even your bold statement: "No, an object in use within a method will not be garbage-collected" is not correct, see this [Q&A](https://stackoverflow.com/questions/58714980/rejectedexecutionexception-inside-single-executor-service). you seem to confuse reachability and scope. – Eugene Feb 03 '21 at 15:28
0

Garbage collection is not the main issue with this code. The lack of any synchronization is the main issue.

If thread-2 is "searching" in something, it necessarily has a reference to that thing, so it's not going to get GC'd.

Why not use 'synchronized' so you can be sure of what will happen?

a guest
  • 462
  • 3
  • 5
  • In my app, it's fine for Thread-2 to return the stale data. we don't want to take a performance hit and use something like ConcurrentHashSet – mihir S Feb 03 '21 at 02:07
  • 2
    A "performance hit" for a once-a-second operation? – a guest Feb 03 '21 at 03:35
0

garbage collection wise, you will not get into any surprise, but not because the accepted answer implies. It is somehow more tricky.

May be visualizing this will help.

  accountIds ----> some_instance_1

Suppose ThreadA is now working with some_instance_1. It started to search for accountId in it. While that operation is going, ThreadB changes what that reference is pointing to. So it becomes:

                   some_instance_1
  accountIds ----> some_instance_2

Because reference assign is atomic, this is what ThreadA will see also, if it reads that reference again. At this point in time, some_instance_1 is eligible for garbage collection, since no one refers to it. Just take note, that this will only happen if ThreadA sees this write that ThreadB did. Either way : you are safe (gc wise only), because ThreadA either works with a stale copy (which you said it's OK) or the most recent one.

This does not mean that everything is fine with your code.


What that answer gets right indeed is that reference assigning is atomic, so once a thread writes to a reference (accountIds = getAccountIds()), a reading thread (accountIds.contains(instanceId);) that will indeed perform the read will see the write. I say "indeed" because an optimizer might not even issue such a read, to begin with. In very simple (and somehow wrong) words, each thread might get its own copy of accountIds and because that is a "plain" read without any special semantics (like volatile, release/acquire, synchronization, etc), reading threads have no obligation to see the writing thread action.

So, even if someone actually did accountIds = getAccountIds(), it does not mean that reading threads will see this. And it gets worse. This write might not ever be seen. You need to introduce special semantics if you want guarantees (and you absolutely do).

For that you need to make your Set volatile:

private volatile Set<String> accountIds = ...

so that when multi threads are involved, you would get the needed visibility guarantees.

Then not to interfere with any in-flight updates of accountIds, you can simply work on a local copy of it:

 public boolean doesAccountExist(String accountId) {
      Set<String> local = accountIds;
      return local.contains(accountId);
 }  

Even if accountIds change while you are in this method, you do not care about that change, since you are searching against local, which is unaware of the change.

Eugene
  • 117,005
  • 15
  • 201
  • 306