0

The situation is as follows: I implement CharSequence over large text files (in order to be able to pass them to a Pattern).

I have a list of CharWindows which is a pretty simple class:

public final class CharWindow
{
    private final long fileOffset;
    private final long mappingLength;
    private final int charOffset;
    private final int charLength;
    // Constructor, methods, etc etc
}

In a separate class, I generate instances of CharWindows starting from the beginning of the file to the end; during that process, I increment an AtomicInteger (let us call it totalChars) which is the total number of characters in the file.

Now, let us imagine for a moment that a caller calls .charAt(25030) on the CharSequence; but at that time, the reader/decoder class has only completed (successfully) decoding 10430 characters; and it continues on: 15640, 21032, 25602 -- each time updating totalChars. And other callers may call .charAt() with different arguments as well.

Let us say that the reader/decoder class has a (thread safe, concurrent-friendly) method has a .needChars() with an int as an argument and the code of .charAt() in the CharSequence implementation reads:

@Override
public char charAt(final int index)
{
    readerDecoder.needChars(index);
    // do whatever is needed to read the chars
}

Is there a way implement .needChars() so that it block waiting for totalChars to reach an appropriate value?

fge
  • 119,121
  • 33
  • 254
  • 329

1 Answers1

3

I implemented something similar using a PriorityBlockingQueue and CountDownLatches. The idea is to bind in your case the required number of characters to a latch, which can be awaited, e.g.

public class RequireCharacters implements Comparable<RequireCharacters> {
   public final long required;
   public final CountDownLatch latch = new CountDownLatch(1);

   /* ctor etc. */

   public int compareTo(RequireCharacters other) { return Long.compare(this.required, other.required); }
}

Then you can use a single PriorityBlockingQueue to manage threads, which are waiting for more characters. If a thread in needChars() finds, that there are not enough characters processed yet, he submits a RequireCharacters object to the queue and awaits the contained latch. Whenever the character count is increased, the increasing thread has to check the queue. As the queued elements are ordered, it can peek for and release waiting threads until more characters have to be processed to satisfy a waiter.

There is a slight chance, that a thread observes an insufficient character count, but submits a RequireCharacters object after another thread increased the character count. Either synchronize appropriately or re-check the count after submitting to the queue.

Pyranja
  • 3,529
  • 22
  • 24
  • I forgot to add it, but I think that the waiting time must be significant to justify the overhead of such synchronization. If the average waiting time is rather short, you may be better off by simply spin waiting on the AtomicLong (maybe add some Thread.yield()s). – Pyranja Feb 26 '14 at 22:58
  • Well, it depends on the initial file size... If it's several hundreds of megabytes long, it can take some time to verify that all is text/calculate the offsets/etc. Also, I'll shortcut if there is no need to wait, of course, which will be eventually the case for all callers. – fge Feb 26 '14 at 23:12