I've profiled my application and it seems like one of my biggest bottlenecks at the moment is the String.split
method. It's taking up 21% of my runtime, and the other big contributors aren't parts that I can streamline anymore than they are. It also seems like all of the newly-created String
objects are causing issues with the garbage collector, although I'm less clear whether or not that's the case.
I'm reading in a gzipped file comma-separated values that contain financial data. The number of fields in each row varies depending on what kind of record it is, and the size of each field varies too. What's the fastest way to read the data in, creating the fewest intermediate objects?
I saw this thread but none of the answers give any evidence that OpenCSV is any faster than String.split
, and they all seem to focus on using an external library rather than writing new code. I'm also very concerned about memory overhead, because I spend another 20% or so of the total runtime doing garbage collection. I would like to just return views of the string in question, but it looks like that's not possible anymore.