0

I am trying to solve a problem where I have a large CSV file with below structure.

Dataset: order_id,product_id,add_to_cart_order,reordered

I want a list of product_id for each order_id.

So I am creating a HashMap(Map<order_id<HashSet<product_id>>) by reading DataSet. Where order_id and product_id I am keeping as String. When I am trying to populate this hashmap then I am getting GC overhead limit exceeded error.

I know this is not an optimized solution so please help me with a better approach to do this work.

DataSet Contains around 90K entries.

File file = new File(csvFile);
CsvReader csvReader = new CsvReader();
try (CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8)) {
            CsvRow row;
            while ((row = csvParser.nextRow()) != null) {
                if (!orderProductMap.containsKey(row.getField(0))) {
                    orderProductMap.put(row.getField(0), new HashSet<>());
                }
                ((Set) orderProductMap.get(row.getField(0))).add(row.getField(1));
            }
}
Dushyant Tankariya
  • 1,432
  • 3
  • 11
  • 17
Anuj jain
  • 493
  • 1
  • 8
  • 26
  • Maybe increase jvm maximum memory with `-Xmx2048m` or `-XX:+UseG1GC`? – Nayfe Jul 15 '19 at 10:45
  • 1
    Possible duplicate of [Error java.lang.OutOfMemoryError: GC overhead limit exceeded](https://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded) – Dushyant Tankariya Jul 15 '19 at 10:47
  • I have already tried with -Xmx2048m but still it not worked. I have seen the links that Dushyant you have provided that links mostly talked about increasing heap size but I want a better optimization in code that I am trying to solve as heap size still has some limit. – Anuj jain Jul 15 '19 at 10:50
  • Are there any more assumptions you can make about the CSV file? Perhaps all entries for each order are consecutive? If so you could process a new order as soon as the order has been read. Something like `if(order_id != previous_order_id) processOrder(previousOrderId);`. – OldCurmudgeon Jul 15 '19 at 10:56
  • @OldCurmudgeon Let me clarify more I need this in hashmap as I am using to join with another dataset I have. Other Dataset i have contain ```Map>``` and I want the final result as ```Map>``` – Anuj jain Jul 15 '19 at 11:00
  • How big is the CSV files? That would give you a very rough idea of how much memory you need. The HashMap and HashSet will also consumme some memory – Guillaume Jul 15 '19 at 11:33
  • small optimization ```if (!orderProductMap.containsKey(row.getField(0))) { orderProductMap.put(row.getField(0), new HashSet<>(2)); }``` helps if you have many unique keys – Vitaly Jul 15 '19 at 11:35

0 Answers0