-2

I need to read a very large file (1.11gb) into memory and process it in bytes. The only way for me to do this is to use an ArrayList (I can't use a byte[] because then it will exceed the limit). There is no way to make the file smaller (I'm using it as a test to test how long my program processes data). I then need to drop an ArrayList back onto the hard drive as a file (still 1.11gb) I'm not as worried about writing as I am reading. Also speed is of the essence so sub segmenting is to be avoided unless anyone out there has a quick way of doing so.

Kai Arakawa
  • 193
  • 1
  • 1
  • 14
  • 3
    Please explain: _I can't use a byte[] because then it will exceed the limit_ – Sotirios Delimanolis May 24 '15 at 01:28
  • 1
    `ArrayList` is a `byte[]` under the hood – kaykay May 24 '15 at 01:31
  • 4
    @kaykay No it's not, it's an Object[]. Which uses 4 or 8 times as much memory as the byte array would. – user253751 May 24 '15 at 01:39
  • @SotiriosDelimanolis arrays have a max length of 2^31-1 places, that is too little for my program (it's a file manipulation program) – Kai Arakawa May 24 '15 at 01:43
  • @stealth9799 ArrayLists have the same limit, since they use arrays. still 2^31 is roughly 2gb which is big enough to hold a 1.1gb file. – vandale May 24 '15 at 01:50
  • 3
    I'm voting to close this question as off-topic because this is an [X-Y Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). You do not explain why you want to do what you want to do, which would probably lead to a more appropriate solution. In every scenario I can think of this is about the most naive and worst way to process a file at the `byte` level. –  May 24 '15 at 02:07

1 Answers1

3

You are trying to solve this problem the wrong way (and it won't work1).

The possible ways to solve this are:

  • Redesign the algorithm so that it doesn't need to read the entire file into memory ... in one go.

  • Read the data into multiple byte[] objects to get around the 2^31 array size limit.

  • Map the file using multiple ByteBuffer objects2; see Java MemoryMapping big files.


1 - It won't work because ArrayList has an Object[] inside, and is therefore subject to the same limitation you have with byte arrays. In addition, an ArrayList<Byte> will take 4 to 8 times as much memory as a byte[] representing the same number of bytes. Or more, if you populate the ArrayList<Byte> with Byte objects instantiated the wrong way.

2 - The Buffer APIs all use int sizes and offsets, and (AFAIK) do not support mapping of files >= 2^31 bytes into a single Buffer.

Community
  • 1
  • 1
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216