1

I am using parallel stream to process large dataset. But it gives inconsistent results. The database used is Postgres. I have hierarchical data with levels defined.

For example, I have 5 levels of data in a hierarchy. I am processing the lowest level (5 here) nodes first, persist to the DB. Then while I am processing one level above (4 here), I have to fetch the data that has been already saved in level 5 and process it and save the level 4 data to db.

I am using parallelstream for each level processing. Once the process of level 5 completed and when we are trying to fetch data of that when we process level 4 nodes, the saved data is not reflecting.

When I remove "parallelStream()" in the below code, everything works fine. But it is taking too much time

  • 1
    By using parallel streams you mess up your transactions which are bound to threads. You probably shouldn't use JPA for batch processing anyway. Instead formulate your logic as SQL statements, processing large sets of rows at once and execute them using a `JdbcTemplate` or maybe even write a stored procedure. – Jens Schauder Jan 07 '21 at 05:58
  • To rewrite away from JPA would be tedious and time consuming task at this point of time. Looking for any other option to get this issue sorted out – Praveen Kumar Jan 07 '21 at 13:45
  • @Praveen Kumar, any solution? – Heril Muratovic May 12 '21 at 15:54

0 Answers0