I read a 1,3GB text file line-by-line. I extract and format the content to fit my needs and save it to a new text file again.
Originally I just used the main-thread. But the extracting and formatting takes a lot of CPU time and I wanted to accelerate that with multi-threading.
But that's what my profiler shows:
The garbage collector time rises up to 100% as I start using multiple threads. So java.lang.OutOfMemoryError: GC overhead limit exceeded
errors are thrown.
I have a function to process a single line and I execute that within a newFixedThreadPool
. It doesn't matter if I assign one or four threads to the pool.
Using different profilers I can't find out what code causes the problem. And I don't understand why my GC is at 0,0% when I only use the main-thread.
Does anyone have an idea without having a look at the code?
Update: I tried to abstract some code:
A.java
ExecutorService executor = Executors.newFixedThreadPool(4);
while((line = reader.readLine()) != null) {
Runnable processLine = new Runnable() {
private String line;
private Runnable init(String line) {
this.line = line;
return this;
}
@Override
public void run() {
processLine(line); // @B.java
}
}.init(line);
executor.execute(processLine);
}
B.java
public int processLine(String line) {
String[][] outputLines = new String[x][y];
String field;
for(... x ...) {
for(... y ...) {
field = extractField(line); // @C.java
...
outputLines[x][y] = formatField(field); // @C.java
}
}
write(outputLines); // write the generated lines to BufferedWriter(s)
}
C.java
public String extractField(String line) {
if(filetype.equals("csv") {
String[] splitLine = line.split(";");
return splitLine[position];
}
...
}
public String formatField(String field) {
if(trim == true) {
field = field.trim();
}
...
}