I have an external program which generate some data I need. Usually, I redirect its output to a file, then read it from my Scala application, e.g.
app.exe > output.data
Now, I want to integrate the process, so I did
val stream = "app.exe" lineStream
stream foreach { line => doWork(_) }
Unfortunately, I got GC overhead exception after a while. This app.exe
may generate very large output files, e.g. over 100MB. So I think during the streaming, Scala has been creating/destroying the line
string instance thousands of times, and cause the overhead.
I know I can tune the JVM variables to increase the GC overhead throttling. But I am looking for a way that it doesn't need to create a lot of small line
instances.