2

I am trying to process a large CSV file of approximately 1 million records and after reading the rows (line/line or in chunks), I need to push this to camel-flatpack to create a map with field names and their values.

My requirement is to feed all the CSV records to a flatpack config and generate a java.util.map out of it.

There have been several posts on stackoverflow to resolve this by splitter but my process works fast till almost 35000 records but thereafter it slows down.

I tried even adding a throttler, it still doesnt work. I get a GC Out Of Memory Error. I even shot up my JAVA_MIN_MEM, JAVA_MAX_MEM, JAVA_PERM_MEM, JAVA_MAX_PERM_MEM but the result is the same. Hawtio console shows that JAVA_HEAP_MEMORY after about 5-6 mins is more than 95%.

Here is my code snippet:

    <route id="poller-route"> 
        <from uri="file://temp/output?noop=true&amp;maxMessagesPerPoll=10&amp;delay=5000"/>
        <split streaming="true" stopOnException="false">            
            <tokenize token="\n" />
            <to uri="flatpack:delim:flatpackConfig/flatPackConfig.pzmap.xml?ignoreFirstRecord=false"/>              
        </split>
    </route>

    <route id="output-route">
        <from uri="flatpack:delim:flatpackConfig/flatPackConfig.pzmap.xml?ignoreFirstRecord=false"/>
        <convertBodyTo type="java.util.Map"/>
        <to uri="mock:result"/>
    </route>
рüффп
  • 5,172
  • 34
  • 67
  • 113
  • How big are the objects you're adding to the Map and how much heap have you allocated to the JVM? Presumably you are running out of memory simply because you'r trying to add more bytes to the Map than you've got heap space for. – matt helliwell Aug 25 '14 at 06:57

1 Answers1

0

One potential problem is that when you create hash maps and continuously add data to it, it needs to recreate the hash. For example, if i have hash of size 3, and input 0,1,2,3 into it, assuming my hash function is mod 3, three would be assigned to the zero slot thus creating overflow, so I would either need to store overflows or recreate a new hash.

I'm sure that this is how java implements its hashmap, but you could try initializing your hashmap's initial capacity to how many records there are.

Abbath
  • 582
  • 2
  • 7