0

Hi I have my program structure - in a spring boot project as below:

in a Service class: File 1 - loaded in arraylist1(list of pojos)

File 2 -- loaded in arraylist2(list of pojos)

File 3- loaded in arraylist3(list of pojos)

input file --- parsed and loaded in arraylist. output Arraylist for (iterate input file- arraylist){

//for output field 1
for(iterate in file1){
    if field available in file1 - assign output column
else
reject record..
}

//for output field 2
for(iterate in file2){
    if field available in file2 - assign output column
else
reject record..
}


//for output field 3
for(iterate in file3){
    if field available in file3 - assign output column
else
reject record..
}

assign to other output fields from input fields..

output field 4=inputfield 4
output field 5=inputfield 5
output field 6=inputfield 6
output field 7=inputfield 7
output field 8=inputfield 8

outputList.add(output pojo) }

So while reading the File 2 which is of 2 gb , the process hungs or throws Out of memory error. Completely stuck with this,Please help with this problem. Thank you

sat1219
  • 25
  • 1
  • 5
  • What are your JVM settings for `-Xms' and `-Xmx`? However, a 2GB file is pretty large. It is possible that attempting to read the entire thing into memory will be problematic, so approaches that can process it more piecemeal would be worth considering. – KevinO Nov 17 '18 at 02:45
  • I have to wonder if t his is an [XY Problem](http://xyproblem.info), if your overall approach to a solution is wrong. Have you considered using a database instead? – Hovercraft Full Of Eels Nov 17 '18 at 02:48
  • @Kevin Thanks for reply, I have tried with -Xmx option of 5000m , but still the same issue. But the same code of reading and storing in arraylist - method if i run in a stand alone Test class , It runs fine. But when we call it in a service class in spring boot project then only i am getting this issue. please reply. – sat1219 Nov 17 '18 at 19:16

1 Answers1

0

When dealing with large input/outputs the best way of approaching the problem is to chunk it. For example, you could set a max size of each arraylist to something like 10000 and process each arraylist in chunks.

However, given your file size I feel that you could perhaps use a database rather than trying to work with such large inputs in memory. You should rethink your approach.

John Kim
  • 1,081
  • 10
  • 26
  • @Jhon -Database is one option. But is there any other work around in java side so that i don't have to recode everything. Please help with this. – sat1219 Nov 17 '18 at 19:15
  • The only other workaround is to increase the JVM heap size : https://stackoverflow.com/questions/6452765/how-to-increase-heap-size-of-jvm However I'm telling you right now if your files are 2GB in size you definitely need a database. There's no reason to keep such large arraylists in memory. I suggest using an embedded database such as H2 (you can save the database locally on your hard drive). – John Kim Nov 18 '18 at 02:12
  • @Jhon Thanks for your reply , I will relook on using the database. I might need to do bigger refactor in this 11th hour :( .Also just for your FYI we are currently using Redshift where were ultimately storing the file processed data. – sat1219 Nov 18 '18 at 17:54
  • @Jhon can you please elaborate on the point "embedded database such as H2" – sat1219 Nov 18 '18 at 22:35
  • @sat1219 See this video : https://www.youtube.com/watch?v=1eedQuB4v6Y Essentially, an embedded database is a database that is embedded into the application without having to connect to a server. You can store the database locally on your hard drive. I cannot tell you how to do everything from implementing an H2 database. There are many resources out there including the link I provided for you in this comment. Research how to use an H2 database and implement it into your code. – John Kim Nov 19 '18 at 05:13
  • Thank you John for sharing the info. – sat1219 Nov 20 '18 at 00:58