0

I am trying to read a huge file which has approximately one billion lines in it. I want to use Stream for parallel processing and insert each lines in a database. I am doing something like this,

br = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
list = br.lines().parallel().collect(Collectors.toList());

This will store all the lines in a list. But I don't want to keep all the lines in memory. So, I want to save them into database as soon as a line is read. Please help me in achieving this. Also, guide me in tweaking this idea.

Thanks in advance :)

Praveen Kumar
  • 977
  • 3
  • 12
  • 26
  • 1
    This is going to be hugely inefficient compared to regular batching without parallelism. – Kayaman Aug 02 '18 at 05:25
  • Agree with @Kayaman, you may try to slice your stream into batches, see [this answer](https://stackoverflow.com/a/31642381/2753863), and then do the batch DB insert. – Vladimir Vagaytsev Aug 02 '18 at 07:03
  • @VladimirVagaytsev I am actually writing a spring batch application and I'm reading the file inside a tasklet. In my local machine the entire process of reading and inserting completes in 500000ms. However, the job takes a lot of time to finish the job and the transaction. Any idea how to fix this? – Praveen Kumar Aug 02 '18 at 16:29

1 Answers1

2

Seems you need to use forEach and pass a Consumer that will take the line and store it in the database

lines.parallel()
     .forEach(line -> {
        //Invoke the code passing the 'line' that persists in the DB...something like
        dbWriter.write(line);
     });
Thiyagu
  • 17,362
  • 5
  • 42
  • 79