9

Is there a cleaner and faster way to do this:

BufferedReader inputReader = new BufferedReader(new InputStreamReader(context.openFileInput("data.txt")));
String inputString;
StringBuilder stringBuffer = new StringBuilder();
while ((inputString = inputReader.readLine()) != null) {
    stringBuffer.append(inputString + "\n");
}
text = stringBuffer.toString();
byte[] data = text.getBytes();

Basically I'm trying to convert a file into byte[], except if the file is large enough then I run into an outofmemory error. I've been looking around SO for a solution, I tried to do this here, and it didn't work. Any help would be appreciated.

Community
  • 1
  • 1
eWizardII
  • 1,916
  • 4
  • 32
  • 55
  • 1
    There are lots of good thoughts on the issue in that thread. – keyser Jan 30 '13 at 07:40
  • I tried to implement the actual answer, the only problem is then what do I do with mbb? Like is that already in byte []? – eWizardII Jan 30 '13 at 07:41
  • It's confusing to name a StringBuilder stringBuffer, since a [StringBuffer](http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuffer.html) is a thread safe version of StringBuilder. Just saying. – keyser Jan 30 '13 at 07:51

7 Answers7

6

Few suggestions:

  1. You don't need to create string builder. You can directly read bytes from the file.
  2. If you read multiple files, check for those byte[] arrays remaining in memory even when not required.
  3. Lastly increase the maximum memory for your java process using -Xmx option.
Ashwinee K Jha
  • 9,187
  • 2
  • 25
  • 19
  • Thanks a lot and also from the advice, below, basically had the file upload, then clear out the old one, and continue to do this so it didn't become excessively big. – eWizardII Jan 31 '13 at 03:23
  • 3 is bad advice. There are cases of people increasing the max heap size, which actually *causes* the OutOfMemoryError. [Here is a good example.](https://bugs.openjdk.java.net/browse/JDK-6478546) I have also seen this occur with the Oracle JDK. I think it has to do with how when you increase the max heap size using `-Xmx` you also are decreasing the native memory available, and FileInputStream uses native memory, although that is just a theory. The `-Xmx` flag only increases the max heap size, not the "maximum memory for your java process" as you state, which is limited to 4GB for 32-bit Java. – Max Oct 11 '18 at 13:30
3

As we know the size of this file, somewhat half of the memory can be saved by allocating the byte array of the given size directly rather than expanding it:

byte [] data = new byte[ (int) file.length() ];
FileInputStream fin = new FileInputStream(file);
int n = 0;
while ( (n = fin.read(data, n, data.length() - n) ) > 0);

This will avoid allocating unnecessary additional structures. The byte array is only allocated once and has the correct size from beginning. The while loop ensures all data are loaded ( read(byte[], offset, length) may read only part of file but returns the number of bytes read).

Clarification: When the StringBuilder runs out, it allocates a new buffer that is the two times larger than the initial buffer. At this moment, we are using about twice the amount of memory that would be minimally required. In the most degenerate case (one last byte does not fit into some already big buffer), near three times the minimal amount of RAM may be required.

Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93
  • When the StringBuilder runs out, it allocates a new buffer. At that point, we have the two buffers, old and new. Hence at at this point we are using two times more memory than is minimally required. – Audrius Meškauskas Jan 30 '13 at 07:57
  • I also thought that the new buffer would have twice the size of the old one (it's certainly bigger, right? :p). The documentation wasn't clear on this. – keyser Jan 30 '13 at 07:58
  • Yes, it will probably be the (old_size + 1) * 2 as can be verified in the [source code of OpenJDK](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.ensureCapacity%28int%29). Hence near as much as three times more memory than necessary may be required in the most degenerate corner cases. – Audrius Meškauskas Jan 30 '13 at 08:01
  • That's what I thought. This does help promote your solution ;) Thanks for the link. – keyser Jan 30 '13 at 08:02
  • What if the file size doesn't fit to an int? What if the array is still too large for memory? Not an answer. – user207421 Jan 30 '13 at 11:40
  • The task is to *read the file into array* so the array itself can be defined and fits into memory. – Audrius Meškauskas Jan 30 '13 at 11:52
  • 1
    I tried to implement this solution currently testing it I think you meant to say fin.read, instead of file.read. Thanks for the help. – eWizardII Jan 31 '13 at 15:46
  • Thanks, this seems to be working quite well. I will continue testing it though Only thing I might have run into is the problem about it reading only part of the file - but I will see if this is a glitch in my other code. – eWizardII Feb 05 '13 at 15:23
2

If you haven't enough memory to store there whole file, you can try rethink your algorithm to process file data while reading it, without constructing large byte[] array data.

If you have already tried increase java memory by playing with -Xmx parameter, then there isn't any solution, which will allow you store data in memory, which can not be located there due to its large size.

Andremoniy
  • 34,031
  • 20
  • 135
  • 241
0

You are copying bytes into char (which use twice the space) and back into bytes again.

InputStream in = context.openFileInput("data.txt");
ByteArrayOutputStream bais = new ByteArrayOutputStream();
byte[] bytes = new byte[8192];
for(int len; (lne = in.read(bytes) > 0;)
   bais.write(bytes, 0, len);
in.close();
return bais.toByteArray();

This will half your memory requirement but it can still mean you run out of memory. In this case you have to either

  • increase your maximum heap size
  • process the file progressively instead of all at once
  • use memory mapped files which allows you to "load" a file without using much heap.
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
0

This is similar to File to byte[] in Java

You're currently reading in bytes, converting them to characters, and then trying to turn them back into bytes. From the InputStreamReader class in the Java API:

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters..

It would be way more efficient to just read in bytes.

One way would be to use a ByteArrayInputStream directly on context.openFileInput(), or the Jakarta Commons IOUtils.toByteArray(InputStream), or if you're using JDK7 you can use Files.readAllBytes(Path).

Community
  • 1
  • 1
lumiera
  • 1
  • 1
-1

The 'cleaner and faster way' is not to do it at all. It doesn't scale. Process the file a piece at a time.

user207421
  • 305,947
  • 44
  • 307
  • 483
-2

This solution will test the free memory before loading...

File test = new File("c:/tmp/example.txt");

    long freeMemory = Runtime.getRuntime().freeMemory();
    if(test.length()<freeMemory) {
        byte[] bytes = new byte[(int) test.length()];
        FileChannel fc = new FileInputStream(test).getChannel();
        MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int) fc.size());

        while(mbb.hasRemaining()) {
            mbb.get(bytes);
        }
        fc.close();
    }
JayTee
  • 1,209
  • 9
  • 15
  • And? What does he do if there isn't enough memory? Not an answer. – user207421 Jan 30 '13 at 11:37
  • If there's not enough memory then it is not doable, the requirement as stated in the question is to have a byte array that holds the entire file contents! Yes I agree with your post, should stream and handle in chunks if possible. – JayTee Jan 30 '13 at 12:33