0

I need to write a number of files into memory, to be quickly accessed by multiple threads. There are around 20 files, ranging from 10mb - 200mb in size. The total size of the files is around 3Gb in RAM. I'm trying to store the files as Strings in a Map.

Below is the code i'm using. I am looping through the input Directory, selecting specific files and using a string building to concatenate the lines in the files, since the files are \n delimited and I need a single string.

Each file contains one header line, starting with > - which i'm using to create the key for the map.

The startRam and endram, and loop at the end are just to work out what's going on.

private Map<String, String> readFilesIntoMap(File referenceDirectory) throws IOException{
        Map<String, String> refMap = new HashMap<>();

        long startRAM = Runtime.getRuntime().freeMemory();



        for(File refFile : referenceDirectory.listFiles()){
            if(refFile.getName().endsWith(".fa")) {
                BufferedReader br = new BufferedReader(new FileReader(refFile));
                String mapKey = "";
                StringBuilder mapValue = new StringBuilder();

                String currentLine;
                while((currentLine = br.readLine()) != null){
                    if(currentLine.startsWith(">")) mapKey = currentLine.split(" ")[0].substring(1);
                    else mapValue.append(currentLine);
                }

                br.close();

                refMap.put(mapKey, mapValue.toString());

            }
        }

        for(Map.Entry<String, String> entry : refMap.entrySet()){
            System.out.println(entry.getKey() + ": " + entry.getValue().substring(0, 10));
            System.out.println(entry.getValue().getBytes().length);
        }

        long endRAM = Runtime.getRuntime().freeMemory();

        System.out.println(endRAM - startRAM);


        return refMap;
}

The code works perfectly fine, and the map is created. My issue, however is that the program consumes SIGNIFICANTLY more RAM than the file sizes. The files together are 2.9Gb, and at it's peak the program is reaching 9Gb of RAM.

I realise that the String Builder will be in RAM as well as the map that the value is being stored into. However I don't see how the RAM is 3x the size of the information that's being input.

Any ideas?

Many thanks,

Sam

Sam
  • 1,234
  • 3
  • 17
  • 32
  • "My issue, however is that the program consumes SIGNIFICANTLY more RAM than the file sizes." Because abstraction has a cost. Information stored in Java objects consumes of course more bytes as in their plain form. – davidxxx Nov 17 '17 at 15:27
  • Up to three times as much? When I print out the size of the values (The files in string format), the size is pretty much the same as input format itself. – Sam Nov 17 '17 at 15:28
  • The internal char representation is UTF-16. A string has a a minimal size whatever its size, and so for... https://stackoverflow.com/questions/31206851/how-much-memory-does-a-string-use-in-java-8 – davidxxx Nov 17 '17 at 15:31
  • It seems as the String gets large (and it does..) the size in bytes tends towards twice the number of characters. That would explain part of the RAM increase – Sam Nov 17 '17 at 15:35
  • Possibly `StringBuilder`'s approach to ensuring enough capacity. [This](https://stackoverflow.com/questions/26098281/how-does-ensurecapacity-work-in-java) might help. – Andrew S Nov 17 '17 at 15:57

0 Answers0