0

I have multiple huge files (around 10-12 files, 1-2Gb each) that need to be downloaded every hour. I store them as a Hashmap, where the key is the file version, and the value is the file contents in list of strings. The problem we face is that every hour when new files are downloaded, the GC kicks in and cleans up the old files, which causes long pauses for the system, since the files are huge. I am thinking of a solution where we can store these file contents off-heap. For this, we explored Chronicle Map.

Question 1: While downloading, does OS use any on-heap buffers or data structures? If yes, then it does not matter if I store the file off heap, since on-heap memory has already been allocated. Is there a way to download and store files off-heap, while using no on-heap memory at all for storing the file?

Question 2: is there a way to store files off heap, and just keep a reference of this memory as a value in my hashmap, thus avoiding the use of any special data structure like Chronicle map.

Sarthak Agarwal
  • 404
  • 1
  • 3
  • 13
  • Sometimes it helps to give more context... Do you actually use the files / have you thought about downloading them to a file somewhere instead? If you are trying to detect changes, perhaps there are other ways (store an MD5 in the map)? 1-2GB x 10-12 seems like a huge about of RAM to be allocating for what?? – Mr R Mar 23 '21 at 05:22
  • We don't have any use case of downloading them to a file. We have to read through the list of strings, but we are ready to sacrifice latency here. We just want to avoid more GC pauses. The numbers I provided are the worst case numbers. Usually, the file sizes will be around 300Mb, and we may want to download 4-6 files. – Sarthak Agarwal Mar 23 '21 at 13:08
  • So @Sarthak Agarwal do you mean read thru over and over or process once? The former might justify bringing into memory, the latter is a save to temp file and process from temp file sort of scenario. Can also filter the data down / do some preprocessing prior to use. Once saved as files then you could consider MemoryMappedIO (https://stackoverflow.com/questions/22153377/java-nio-memory-mapped-files) which is where a large chunk of the file can be mapped into memory (as a byte bucket) not on the heap - and that "window" into the file is easily moved (but still relatively high performance). – Mr R Mar 23 '21 at 18:41
  • Would processing the input as you download it be an option? Then the data would only have to be buffered (e.g. by means of a `BufferedReader`) and not fully stored in memory. If there are processing dependencies between the files, you could reduce the data set by extracting only the required bits and continuing with processing once all files have been downloaded (and preprocessed). – horstr Apr 03 '21 at 15:42

0 Answers0