I'm reading some very large text files and running into the error described in Java - Char Buffer Issue.
When I have a very large file (>1gb), Charset.defaultCharset().decode(ByteBuffer bb).toString()
throws an IllegalArgumentException
. Presumably because the buffer capacity overflows and becomes a negative number.
Here's the slurp function I've been using:
public static String slurp(File f) throws IOException, FileNotFoundException
{
FileInputStream fis = new FileInputStream(f);
try{
FileChannel fc = fis.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
//The decode method on the following line throws the IllegalArgumentException
return Charset.defaultCharset().decode(bb).toString();
}finally{
fis.close();
}
}
I'd like to simply add error handling to this function for when the exception is thrown to use an alternative, safer method, such as the pattern from the question statement in How do I create a Java string from the contents of a file?
For example,
public static String slurp(File f) throws IOException, FileNotFoundException
{
FileInputStream fis = new FileInputStream(f);
try{
FileChannel fc = fis.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
return Charset.defaultCharset().decode(bb).toString();
} catch (IllegalArgumentException e) {
// This exception is thrown by extremely large files
BufferedReader reader = new BufferedReader(new FileReader(f));
String line = null;
StringBuilder stringBuilder = new StringBuilder();
String ls = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
return stringBuilder.toString();
}finally{
fis.close();
}
}
An alternative would be to use the most memory efficient proposed answer on the same question.
public static String slurp(File f) throws IOException, FileNotFoundException
{
FileInputStream fis = new FileInputStream(f);
try{
FileChannel fc = fis.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
return Charset.defaultCharset().decode(bb).toString();
} catch (IllegalArgumentException e) {
// This exception is thrown by extremely large files
List<String> lines = Files.readAllLines(f.toPath(), Charset.defaultCharset());
return String.join("/n", lines);
}finally{
fis.close();
}
}
Any large file is going to be cumbersome in memory when slurped rather than streamed, but is there any reason to prefer one of these two methods or something else all together?
I ask because the accepted answer to the question discusses the memory utilization of the two answer solutions but not the questioner's example pattern.