I want to parse a file and keep it in-memory as Map<aID, Set<bID>>
.
unique_a_IDs = 50.000;
unique_b_IDs = 1.000;
avg_set_length = 50;
As you can see, all set in summary will keep unique_a_IDs * avg_set_length = 2.500.000
of bIDs
. Where each bID
is from 0 to 1000. So in average each bID
will be stored 2500 times. And I don't want JVM allocate memory 2500 times for each integer.
Is there any trick to keep that data structure memory-efficient?
The problem is that I can't (at least I don't know how yet) to use java's integer/string pools. Integer pool works only for numbers in range -128...127. String pool works only for compile time constants, but I read my bID
s from file.
Code example
import java.util.*;
public class MemoryTest {
private final static Integer A_IDS_AMOUNT = 65536;
private final static Integer B_IDS_AMOUNT = 1000;
private final static Integer AVERAGE_SET_LENGTH = 50;
private final static Random rand = new Random();
public static void main(String [] args) {
Map<Integer, Set<Integer>> map = new HashMap<>(A_IDS_AMOUNT);
for (int i = 0; i < A_IDS_AMOUNT; i++) {
Set<Integer> set = genRandomSet();
map.put(i, set);
}
// Where SizeOf is premain class which use java instruments
long size = new SizeOf().deepsize(map) / (1024 * 1024);
System.out.println("Bytes used by object: " + size + " Mb"); //results in 175 Mb
}
private static Set<Integer> genRandomSet() {
Set<Integer> set = new HashSet<>(AVERAGE_SET_LENGTH);
for (int i = 0; i < AVERAGE_SET_LENGTH; i++) {
set.add(rand.nextInt(B_IDS_AMOUNT));
}
return set;
}
}