My main program looks as below (pseudo code):
public void main(String[] args) {
// produce lots of int[] data which is stored inside a list of hashmaps
List<HashMap<Integer, int[]>> dataArray1 = new
ArrayList<HashMap<Integer, int[]>>();
...
// create a new list of data, similar to dataArray1
// now we will write into dataArray2 and read from dataArray1
List<HashMap<Integer, int[]>> dataArray2 = new
ArrayList<HashMap<Integer, int[]>>();
while (true) {
if (exitCondition) break;
...
for index1, index2 in a set of indices {
int[] a1 = dataArray1.get(index1).get(key1);
int[] a2 = dataArray1.get(index2).get(key2);
int[] b = intersect a1 and a2;
int i = generateIndex(index1, index2);
int key = generateKey(key1, key2);
dataArray2.get(i).put(key, b);
}
}
// now we can remove dataArray1
dataArray1 = null;
// create a new list of data, similar to dataArray2
// now we will write into dataArray3 and read from dataArray2
List<HashMap<Integer, int[]>> dataArray3 = new
ArrayList<HashMap<Integer, int[]>>();
while (true) {
if (exitCondition) break;
...
for index1, index2 in a set of indices {
int[] a1 = dataArray2.get(index1).get(key1);
int[] a2 = dataArray2.get(index2).get(key2);
int[] b = intersect a1 and a2;
int i = generateIndex(index1, index2);
int key = generateKey(key1, key2);
dataArray3.get(i).put(key, b);
}
}
// now we can remove dataArray2
dataArray2 = null;
...
// and so on 20 times
}
My problem is that at some point dataArrayk
for some k > 1
becomes heavy (say 20 Gb) thus storing it in memory is impossible. I can change int[]
onto BitSet
but this does not help, memory is spent even more.
The solution would be to use either Database or FileSystem. What would you recommend to use? I need performance (time execution), memory does not matter. If your experience says Database, then please recommend the fastest interface for dealing with specific (which?) database, be it bd4 (Berkeley db), postgresql or whatever. If it says FileSystem, then please recommend the fastest interface (File libraries).
As for statistics of read and writes:
In each while loop of my code I do 3
times more reads than writes, for example: for one level k I read from dataArray_k
12000
times and write into dataArray_(k+1)
4000
times.
I can store each hashmap from List<HashMap<Integer, int[]>> dataArray1
in separate file.