I need to read a very large file (1.11gb) into memory and process it in bytes. The only way for me to do this is to use an ArrayList (I can't use a byte[] because then it will exceed the limit). There is no way to make the file smaller (I'm using it as a test to test how long my program processes data). I then need to drop an ArrayList back onto the hard drive as a file (still 1.11gb) I'm not as worried about writing as I am reading. Also speed is of the essence so sub segmenting is to be avoided unless anyone out there has a quick way of doing so.
-
3Please explain: _I can't use a byte[] because then it will exceed the limit_ – Sotirios Delimanolis May 24 '15 at 01:28
-
1`ArrayList
` is a `byte[]` under the hood – kaykay May 24 '15 at 01:31 -
4@kaykay No it's not, it's an Object[]. Which uses 4 or 8 times as much memory as the byte array would. – user253751 May 24 '15 at 01:39
-
@SotiriosDelimanolis arrays have a max length of 2^31-1 places, that is too little for my program (it's a file manipulation program) – Kai Arakawa May 24 '15 at 01:43
-
@stealth9799 ArrayLists have the same limit, since they use arrays. still 2^31 is roughly 2gb which is big enough to hold a 1.1gb file. – vandale May 24 '15 at 01:50
-
3I'm voting to close this question as off-topic because this is an [X-Y Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). You do not explain why you want to do what you want to do, which would probably lead to a more appropriate solution. In every scenario I can think of this is about the most naive and worst way to process a file at the `byte` level. – May 24 '15 at 02:07
1 Answers
You are trying to solve this problem the wrong way (and it won't work1).
The possible ways to solve this are:
Redesign the algorithm so that it doesn't need to read the entire file into memory ... in one go.
Read the data into multiple
byte[]
objects to get around the2^31
array size limit.Map the file using multiple
ByteBuffer
objects2; see Java MemoryMapping big files.
1 - It won't work because ArrayList
has an Object[]
inside, and is therefore subject to the same limitation you have with byte arrays. In addition, an ArrayList<Byte>
will take 4 to 8 times as much memory as a byte[]
representing the same number of bytes. Or more, if you populate the ArrayList<Byte>
with Byte
objects instantiated the wrong way.
2 - The Buffer
APIs all use int
sizes and offsets, and (AFAIK) do not support mapping of files >= 2^31 bytes into a single Buffer
.