2

I am using java to read data from file, copy the data to smaller arrays and put these arrays in Hashtables. I noticed that Hashmap consumes more memory (about double) than what is in the original file! Any idea why?

Here is my code:

public static void main(final String[] args) throws IOException {
    final PrintWriter writer = new PrintWriter(new FileWriter("test.txt",
            true));
    for(int i = 0; i < 1000000; i++)
        writer.println("This is just a dummy text!");
    writer.close();

    final BufferedReader reader = new BufferedReader(new FileReader(
            "test.txt"));
    final HashMap<Integer, String> testMap = new HashMap<Integer, String>();
    String line = reader.readLine();
    int k = 0;
    while(line != null) {
        testMap.put(k, line);
        k++;
        line = reader.readLine();
    }
}
rolve
  • 10,083
  • 4
  • 55
  • 75
user1785771
  • 487
  • 2
  • 7
  • 18

3 Answers3

7

This is not a problem of HashMap, its a problem of Java Objects in general. Each object has a certain memory overhead, including the arrays and the entries in your HashMap.

But more importantly: Character data consumes double the space in memory. The reason for this is that Java uses 16 bits for each character, whereas the file is probably encoded in ASCII or UTF-8, which only uses 7 or 8 bits per character.

Update: There is not much you can do about this. The code you posted is fine in principle. It just doesn't work with huge files. You might be able to do a little better if you tune your HashMap carefully, or you might use a byte array instead of a String to store your characters (assuming everything is ASCII or one-byte UTF-8).

But in the end, to solve your out-of-memory problems, the right way to go is to rethink your program so that you don't have to read the whole file into memory at once.

Whatever it is you're doing with the content of that file, think about whether you can do it while reading the file from disk (this is called streaming) or maybe extract the relevant parts and only store those. You could also try to random access the file.

I suggest you read up on those things a bit, try something and come back and ask a new question, specific to your application. Because this thread is getting too long.

Community
  • 1
  • 1
rolve
  • 10,083
  • 4
  • 55
  • 75
  • +1 for the follow-up ;-) Indeed if this code runs out of memory, there is not much to do but to not store the whole file in memory. – assylias Nov 01 '12 at 23:10
6

A map is an "extendable" structure - when it reaches its capacity it gets resized. So it is possible that say 40% of the space used by your map is actually empty. If you know how many entries will be in your map, you can use the ad hoc constructors to size your map in an optimal way:

Map<xx,yy> map = new HashMap<> (length, 1);

Even if you do that, the map will still use more space than the actual size of the contained items.

In more details: HashMap's size gets doubled when it reaches (capacity * loadFactor). Default load factor for a HashMap is 0.75.

Example:

  • Imagine your map has a capacity (size) of 10,000 entries
  • You then put 7,501 entries in the map. Capacity * loadFactor = 10,000 * 0.75 = 7,500
  • So your hashmap has reached its resize threshold and gets resized to (capacity * 2) = 20,000, although you are only holding 7,501 entries. That wastes a lot of space.

EDIT

This simple code gives you an idea of what happens in practice - the output is:

threshold of empty map = 8192
size of empty map = 35792
threshold of filled map = 8192
size of filled map = 1181712
threshold with one more entry = 16384
size with one more entry = 66640

which shows that if the last item you add happens to force the map to resize, it can artificially increase the size of your map. Admittedly, that does not account for the whole effect that you are observing.

public static void main(String[] args) throws java.lang.Exception {
    Field f = HashMap.class.getDeclaredField("threshold");
    f.setAccessible(true);

    long mem = Runtime.getRuntime().freeMemory();
    Map<String, String> map = new HashMap<>(2 << 12, 1); // 8,192
    System.out.println("threshold of empty map = " + f.get(map));
    System.out.println("size of empty map = " + (mem - Runtime.getRuntime().freeMemory()));

    mem = Runtime.getRuntime().freeMemory();
    for (int i = 0; i < 8192; i++) {
        map.put(String.valueOf(i), String.valueOf(i));
    }
    System.out.println("threshold of filled map = " + f.get(map));
    System.out.println("size of filled map = " + (mem - Runtime.getRuntime().freeMemory()));

    mem = Runtime.getRuntime().freeMemory();
    map.put("a", "a");
    System.out.println("threshold with one more entry = " + f.get(map));
    System.out.println("size with one more entry = " + (mem - Runtime.getRuntime().freeMemory()));
}
assylias
  • 321,522
  • 82
  • 660
  • 783
  • Hash tables have a fair amount of overhead -- in exchange for supporting constant-time lookup. – Louis Wasserman Nov 01 '12 at 16:35
  • @LouisWasserman I would not think the overhead would account for such a difference. – assylias Nov 01 '12 at 16:36
  • yes I read about that @assylias , but when it doubles its capacity not its real size. Like, if the values are object reference, then how would it know how much memory it will need? – user1785771 Nov 01 '12 at 16:44
  • I don't think that the overhead of the HashMap is the problem. After all, the unused entries only consume a pointer-sized memory chunk, i.e. 8 or 16 bytes. The arrays containing the data are likely much larger. – rolve Nov 01 '12 at 16:44
  • I agree to @rolve. But what can I do? – user1785771 Nov 01 '12 at 17:12
  • @assylias , yes I see your point. But what is the workaround? If I load a 20 MB file I find out that it consumes 80 MB !!! Any Workaround? – user1785771 Nov 01 '12 at 18:25
  • 1
    @user1785771 Post the *relevant parts* of your code, then we might be able to help you. – rolve Nov 01 '12 at 18:30
  • @user1785771 You could load the whole file - determine the optimal size of the map, create it and populate it - that involves an extra copy operation which should not be too expensive vs. the time required to read a file. Now depending on what you put in the map and how you put it the final memory consumption can vary widely. Showing how you populate the map would help. – assylias Nov 01 '12 at 18:31
  • It would also be interesting to know how you measure memory usage - do you measure it in a profiler or do you simply check the amount of memory used by the JVM process (which would include some unused heap space depending on the parameters you passed to the JVM). – assylias Nov 01 '12 at 18:37
  • I just used Runtime.getRuntime(). But the thing is that I have limited memory, so I am getting OutofMemory Errr I am new to here, I don't know how to write the code in a clear code section like you guys are doing....is there a tag? – user1785771 Nov 01 '12 at 18:50
  • @user1785771 you can paste the code in your message, select it, and click on the curly braces button. Or simply indent the code by 4 spaces to get the same result. – assylias Nov 01 '12 at 19:44
  • public static void main(String[] args) { BufferedReader reader = null; PrintWriter writer = new PrintWriter(new FileWriter("test.txt", true)); for (int i = 0; i < 1000000; i++) writer.println("This is just a dummy text!"); writer.close(); BufferedReader reader = new BufferedReader(new FileReader(file)); HashMap testMap = new HashMap(); String line = reader.readLine(); int k = 0; while (line != null) { testMap.put(k, line); k++; line = reader.readLine();} – user1785771 Nov 01 '12 at 20:03
  • Sorry, I couldn't do the code properly. But this code generates about 28 MB text and then reads it into hashmap. I am using 100mb for the heab (-Xmx100m) and I am getting out of memory error just with this code.! @assylias – user1785771 Nov 01 '12 at 20:05
0

There are lots of things internal to the implementation of HashMap (and arrays) that need to be stored. Array lengths would be one such example. Not sure if this would account for double, but it could certainly account for some.

Jon Newmuis
  • 25,722
  • 2
  • 45
  • 57