OutOfMemory error when creating a hashmap from a large csv

Question

I am trying to solve a problem where I have a large CSV file with below structure.

Dataset: order_id,product_id,add_to_cart_order,reordered

I want a list of product_id for each order_id.

So I am creating a HashMap(Map<order_id<HashSet<product_id>>) by reading DataSet. Where order_id and product_id I am keeping as String. When I am trying to populate this hashmap then I am getting GC overhead limit exceeded error.

I know this is not an optimized solution so please help me with a better approach to do this work.

DataSet Contains around 90K entries.

File file = new File(csvFile);
CsvReader csvReader = new CsvReader();
try (CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8)) {
            CsvRow row;
            while ((row = csvParser.nextRow()) != null) {
                if (!orderProductMap.containsKey(row.getField(0))) {
                    orderProductMap.put(row.getField(0), new HashSet<>());
                }
                ((Set) orderProductMap.get(row.getField(0))).add(row.getField(1));
            }
}

Maybe increase jvm maximum memory with `-Xmx2048m` or `-XX:+UseG1GC`? — Nayfe, Jul 15 '19 at 10:45
Possible duplicate of [Error java.lang.OutOfMemoryError: GC overhead limit exceeded](https://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded) — Dushyant Tankariya, Jul 15 '19 at 10:47
I have already tried with -Xmx2048m but still it not worked. I have seen the links that Dushyant you have provided that links mostly talked about increasing heap size but I want a better optimization in code that I am trying to solve as heap size still has some limit. — Anuj jain, Jul 15 '19 at 10:50
Are there any more assumptions you can make about the CSV file? Perhaps all entries for each order are consecutive? If so you could process a new order as soon as the order has been read. Something like `if(order_id != previous_order_id) processOrder(previousOrderId);`. — OldCurmudgeon, Jul 15 '19 at 10:56
@OldCurmudgeon Let me clarify more I need this in hashmap as I am using to join with another dataset I have. Other Dataset i have contain ```Map>``` and I want the final result as ```Map>``` — Anuj jain, Jul 15 '19 at 11:00
How big is the CSV files? That would give you a very rough idea of how much memory you need. The HashMap and HashSet will also consumme some memory — Guillaume, Jul 15 '19 at 11:33
small optimization ```if (!orderProductMap.containsKey(row.getField(0))) { orderProductMap.put(row.getField(0), new HashSet<>(2)); }``` helps if you have many unique keys — Vitaly, Jul 15 '19 at 11:35

OutOfMemory error when creating a hashmap from a large csv

0 Answers0