0

I already searched through some SO questions like this one, and this one, but no real conclusion could be made on my end. So, I'll lay it out:

I have the following nested map structure concept, intended for multi-threaded environment:

Map<Integer, HashMap<String, AtomicInteger>> bufferMap = new ConcurrentHashMap<Integer, HashMap<String, AtomicInteger>>(2);

This "buffer map" should basically stores some hourly counters (AtomicIntegers), identified / accessed by specific String keys. So, Integer keys for the buffer map are actually hour numbers (0...23). I only ever intend to "buffer" current and pre-set the next hour. To that effect, there's a timer task that runs once an hour, and does a maintaining procedure - something like this:

private final Map<Integer, HashMap<String, AtomicInteger>> bufferMap;
...
private final java.util.Timer;
private final java.util.TimerTask task = new TimerTask() {  
@Override
public void run() {
    ....
    HashMap<String, AtomicInteger> counterMap = bufferMap.get(previousHour);

    // now read internalMap's values, and "store/flush" them somewhere
    // at this point no thread but this one should access previous hour data

    initializeNextHourSlot(); // populate new map entry for the next hour with new AtomicInteger(0) values

    bufferMap.remove(previousHour); // clear previous hour, as no longer needed
}
}

Now, multiple threads may randomly and/or in parallel access this structure, to increment counters in the following manner:

bufferMap.get(currentHour).get(stringKey).incrementAndGet();

As outer (buffer) map is actually modified by different (Timer's) thread than the ones reading it, it was logical, I assume, to use ConcurrentHashMap.

However, I have my doubts about the inner (counter) map... It will always be populated by the timer thread ahead of time (no other threads should access it for at least an hour), and will then be accessed (read only), as shown above, to increment counter values.

Is this a thread safe approach, or not? And if not, what could be an alternative suggested data structure (and/or approach) ?

Community
  • 1
  • 1
Less
  • 3,047
  • 3
  • 35
  • 46

2 Answers2

1

Is this a thread safe approach, or not?

You can't be sure when a thread will run. Even if you create the structure an hour before you need it, the process could be placed in hibernation and still fail to run (in theory)

And if not, what could be an alternative suggested data structure (and/or approach) ?

A simpler approach is to not use a timer but a compute if absent.

final AtomicReference<TimedData> ref = new AtomicReference<>();

public void increment(String counter) {
   TimedData td = ref.get();
   long hour = System.currentTimeMillis() / 3_600_000;
   if (td.hour != hour) {
       saveData(td); // use back ground thread if needed.
       if (!ref.compareAndSet(td, new TimedData(hour))
            td = ref.get();
   }

   td.counterMap.get(counter)
                .incrementAndGet();
}

for the class

class TimedData {
    final long hour;
    final Map<String, AtomicInteger> counterMap = new HashMap<>();

    public TimedData(long hour) {
        this.hour = hour;
        // init the counterMap
    }

In this case, the background thread is optional and it doesn't matter when it runs.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I gave this an up-vote, however, I'm not sure if I could accept it, as this seems to be Java 8 specific solution, and I forgot to mention that current production platform (still) uses Java 7. I would also have to spend some time to analyse the proposed logic, for my specific case. – Less Dec 29 '15 at 15:05
  • @Less if the `new TimedData` created all the entries needed, you could replace `computeIfAsbent` with `get` so you could use it on Java 5.0+ – Peter Lawrey Dec 29 '15 at 15:08
  • OK. The approach definitely seems interesting, and I will consider it. – Less Dec 29 '15 at 15:29
  • After re-reading this answer few times, I realized that I don't quite understand 2 things, and would appreciate additional clarification: 1) the answer to the first part of the question, about not knowing thread exec. time as well as the process hibernation part, and 2) what exact data type have you imagined for `td.counterMap`...? – Less Dec 29 '15 at 16:52
  • @Less 1) you cannot assume one thread will run before another, they run independently. 2) I have added a class sample. – Peter Lawrey Dec 29 '15 at 16:58
  • Yes, I see the edit. However, we're still left with this snippet: `td.counterMap.computeIfAbsent(...)` or this `td.counterMap.get(key)` that may execute simultaneously on a few different threads. Are you after all implying that plain `HashMap` read (`get()`) is thread-safe, which is kind of the essence of my question...? – Less Dec 29 '15 at 17:02
  • @Less `Map.get(x)` is thread safe if the map is not modified after it has been initialised in the constructor of `TimedData` – Peter Lawrey Dec 29 '15 at 17:14
  • - It looks like multiple threads may end up saving the data. - computeIfAbsent is not thread safe on HashMap. - It's not clear from your example how muptiple counters are kept consistent from a logger perspective. – Nitsan Wakart Dec 30 '15 at 21:12
  • @NitsanWakart goot point. This answer is a little confused as the second part assumes all the entries are created in advance. – Peter Lawrey Dec 30 '15 at 23:55
1

First off, this is not safe. Your timer and writers may concurrently read/write the current hour data. Consider that the line:

bufferMap.get(currentHour).get(stringKey).incrementAndGet();

Is not atomic and this means you actually have:

hourlyMap = bufferMap.get(currentHour);
// assume this thread was now suspended by the OS for some time
keyCounter = hourlyMap.get(stringKey);
keyCounter.incrementAndGet();

It seems to me what you are after is having 2 maps that the timer thread can swap between, but apart from efficiency this is not helpful to the problem above. Assuming you are looking to have a coherent snapshot of all counters (which is a strong requirement) you'll need to exclude writers from updating the counters while you read. There's an example of solving this exact problem in the HdrHistogram library Recorder class using a PhaseLock (http://hdrhistogram.org/), to hack in your map of counters:

public void incKey(String k) {
    long criticalValueAtEnter = recordingPhaser.writerCriticalSectionEnter();
    try {
        activeCounterMap.get(k).incrementAndGet();
    } finally {
        recordingPhaser.writerCriticalSectionExit(criticalValueAtEnter);
    }
}

private void sampleCounters() {
    try {
        recordingPhaser.readerLock();
        // ...swap your maps here...
        recordingPhaser.flipPhase(500000L /* yield in 0.5 msec units if needed */);
    } finally {
        recordingPhaser.readerUnlock();
    }
}

I believe there are other similar locks you can use to acheive the same effect. What you want is many writers to be prioritized over single reader I assume.

Nitsan Wakart
  • 2,841
  • 22
  • 27
  • "It seems to me what you are after is having 2 maps that the timer thread can swap between" - depending on the perspective, this may be true, as essentially I need to buffer counters' data for an hour, and after the hour expires, I need to "start fresh" for the newly entered hour, and flush /persist the data for the previous hour – Less Dec 29 '15 at 15:19
  • Further more, I think that this observation: "Your timer and writers may concurrently read/write the current hour data" is not quite correct, as timer would always in advance prepare 0-counters for the next hour, and would always operate on the previous hour data. Can you explain what you meant exactly, which scenario is when this may happen..? – Less Dec 29 '15 at 15:23
  • @Less Consider a writer starting their write at a fraction of a millisecond to the hour. They get the relevant HashMap and then they get suspended for a few milliseconds before updating it. They will write to the previous hour counters while your logger is busy reading them. – Nitsan Wakart Dec 30 '15 at 21:06
  • OK, I think I see your point... I forgot to mention though that scheduled maintenance procedure (i.e. timer thread, a reader) does not start exactly at 00:00.0 of every hour, but is deliberately delayed for, say, 15 minutes (so it runs in the 15th minute of every hour, and processes the previous hour data only). Maybe I'm mistaken - but something must go terribly wrong in order for a thread to be suspended for 15 minutes by the JVM, no...? – Less Dec 31 '15 at 13:43