There seems to be an issue with inserting into the hashtable. I create about 8 threads, and in each thread I do the following code. Each thread receives a char[] array. The job of each thread is to tokenize this array (look for spaces). Once a token is found, I need to add it to the hashtable if it doesn't exist. If it does exist, then I need to add 1 to the current value of that token (the key).
Questions you might ask:
Why not convert from char[] to String?
I tried this, and since strings are immutable, I eventually ran out of memory (I am processing a 10g file), or I spend too long garbage collecting. With Character[], I am able to reuse the same variable and not take up extra space in memory.
What is the issue?
When I am done processing the entire file, I run the code:
for (Entry<Character [], Integer> e : wordCountMap.entrySet()) {
System.out.println(Arrays.toString(e.getKey()) + " = " + e.getValue());
}
in my main function. What I get as a result is less than 100 key/value pairs. I know that there should be around 20,000. There somehow seems to be some overlap.
Character [] charArray = new Character[8];
for (i = 0; i < newbyte.length; i++) { //newbyte is a char[] from main
if (newbyte[i] != ' ') {
charArray[counter] = newbyte[i];
counter++;
}
else {
check = wordCountMap.putIfAbsent(charArray, 1);
if (check != null) {
wordCountMap.put(charArray, wordCountMap.get(charArray) + 1);
}
for (j = 0; j < counter; j++) {
charArray[j] = null;
}//Null out the array
ConcurrentMap<Character [], Integer> wordCountMap //this is the definition in main
As some of the comments below have suggested, I am actually passing the reference to charArray when the line:
wordCountMap.put(charArray, wordCountMap.get(charArray) + 1);
is executed. So my question is, how do I pass the value? It actually makes perfect sense now, as in the end there are about 320 key/value pairs- 8 threads, 40 loops (Each thread gets 250/8 MBs per iteration).