-1

I'm trying to read files from folders maximum of up to 3200~ files and storing them in the HashMap folder name is key and values are files inside the folder. Im getting java.lang.OutOfMemoryError: Java heap space Exception. Can anyone help? My JVM Heap size is _JAVA_OPTIONS: -Xms1024m -Xmx2048m. How to sort this out?

    private static final String FOLDER_PATH = "H:\\NPA\\74RR\\Docs";

    public static Map<String, Map<Integer, Byte[]>> readFilesFromSystem() {

        File file = new File(FOLDER_PATH);

        
        // folders name example T74001, T74002, T74003
        String[] list = file.list();

        System.out.println(list.length);
        Map<String, Map<Integer, Byte[]>> map = new HashMap<>();
        // looping through single id documents
        for (String p : list) {
            Map<Integer, Byte[]> nestedMap = new HashMap<Integer, Byte[]>();
            File folder = new File(FOLDER_PATH + "\\" + p); // gives "H:\\NPA\\Documents\\T74002
            File[] listOfFiles = folder.listFiles();
            for (int i = 0; i < listOfFiles.length; i++) {
                File xFiles = listOfFiles[i];
                if (xFiles.isFile() && xFiles.getName().toLowerCase().endsWith(".pdf")) {
                    try {
                        byte[] readFileToByteArray = FileUtils.readFileToByteArray(xFiles);
                        Integer docName = removeExtension(xFiles.getName());
                        long sizeInMb = readFileToByteArray.length / (1024 * 1024);
                        System.out.println(p + "---File " + docName + " " + sizeInMb + "Mb");
                        Byte[] b = ArrayUtils.toObject(readFileToByteArray);
                        nestedMap.put(docName, b);
                    } catch (IOException e) {
                        e.printStackTrace();
                    } catch (InvalidFileFormatName e) {
                        e.printStackTrace();
                    }
                }
            }
            map.put(p, nestedMap);
        }
        return map;
    }

    private static Integer removeExtension(String name) throws InvalidFileFormatName {
        try {
            name = name.substring(0, name.lastIndexOf("."));
        } catch (NumberFormatException e) {
            throw new InvalidFileFormatName("Invalid File Format " + name);
        }
        return Integer.parseInt(name);
    }

    public static void main(String[] args) {
        readFilesFromSystem();
    }
}

  • Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Community Jul 23 '22 at 13:33

1 Answers1

0

That Exception means that the given heap (where the file data is stored) is simply not enough to hold all the data your program loads. The only solution is to either supply more heap space or make the program require less of it.

Further increasing your heap size, e.g. by changing the setting to-Xmx4096m can possibly solve the issue, depending on how large your files are. You can increase the number until your computer reaches its memory limit, so its a short term solution at best, as soon as your program will have to load more files, it will eventually stop working.

Make the program use less heap space. The only long term solution would be to not try to load all file contents simultaneously. How to go about that completely depends on what you want to do with your data in your program. A simple solution would be to delay the load of the contents by only storing the file handle in the map, so the type of the nested map would be Map<Integer, File> nestedMap = new HashMap<>();. You would then directly put the file nestedMap.put(docName, xFiles);. The call to FileUtils.readFileToByteArray(xFiles) would then only be done much later when the actual contents are needed. After the file contents are processed, you then immediately remove the read data to make space for the next file. If you read your files several times this will have a performance penalty, though, since the data would need to be read every single time the data is needed, not just once. Some sort of caching strategy can help to balance how much memory your program needs vs how long it takes.

A quick fix, however, might actually be to simply not store the data as Byte[], but as byte[]. In a primitive byte array, each byte more or less needs one byte of memory. For a boxed Byte, however, the byte is stored indirectly and the array contains the adress of that storage. So you need to store the address to that object, as well as the object overhead, too. According to What is the storage cost for a boxed primitive in Java?, boxing your primitive data requires 16-24 times more memory than before. This is also only a short term solution, if the combined file size is larger than your heap, even the most efficient implementation will not be able to store it all, and you will have to go with the solution described above.

MDK
  • 499
  • 3
  • 14