I have a csv file with 12000 rows. Each row has several fields enclosed in double quotes and separated by comma. One of this field is an xml document, thus the row can be very long. The file size is 174 Mb.
Here is an example of the file:
"100000","field1","field30","<root><data>Hello I have a
line break</data></root>","field31"
"100001","field1","field30","<root><data>Hello I have multiple
line
break</data></root>","field31"
The problem with this file is inside the xml field which can have one or more line breaks and thus can break the parsing. The goal here is to read the whole file and apply a regex which will replace all the line breaks inside double quotes with an empty string.
The following code gives me OutOfMemoryError:
String path = "path/to/file.csv";
try {
byte[] content = Files.readAllBytes(Paths.get(path));
}
catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
I've also tried to read the file using BufferedReader and StringBuilder, got OutOfMemoryError around line 5000:
String path = "path/to/file.csv";
try {
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new FileReader(path));
String line;
int count = 0;
while ((line = br.readLine()) != null) {
sb.append(line);
System.out.println("Read " + count++);
}
}
catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
I've tried to run both of the programs above with different java heap values, like -Xmx1024m, -Xmx4096m, -Xmx8092m. In all cases I got OutOfMemoryError. Why is this happening, considering that the file size is 174Mb?