31

I'm trying to create a byte array whose size is of type long. For example, think of it as:

long x = _________;
byte[] b = new byte[x]; 

Apparently you can only specify an int for the size of a byte array.

Before anyone asks why I would need a byte array so large, I'll say I need to encapsulate data of message formats that I am not writing, and one of these message types has a length of an unsigned int (long in Java).

Is there a way to create this byte array?

I am thinking if there's no way around it, I can create a byte array output stream and keep feeding it bytes, but I don't know if there's any restriction on a size of a byte array...

Nayuki
  • 17,911
  • 6
  • 53
  • 80
jbu
  • 15,831
  • 29
  • 82
  • 105
  • 1
    unsigned int on most 32 bit architectures has only one more bit than int in java. A java long is 64 bit and is not suitable for an array index. – Jherico Jul 02 '09 at 00:01
  • 1
    I realize that but I do not know of a way to fully represent the amount of data that the message specifies without implementing my own data type. – jbu Jul 02 '09 at 00:45

5 Answers5

26

(It is probably a bit late for the OP, but it might still be useful for others)

Unfortunately Java does not support arrays with more than 231−1 elements. The maximum consumption is 2 GiB of space for a byte[] array, or 16 GiB of space for a long[] array.

While it is probably not applicable in this case, if the array is going to be sparse, you might be able to get away with using an associative data structure like a Map to match each used offset to the appropriate value. In addition, Trove provides an more memory-efficient implementation for storing primitive values than standard Java collections.

If the array is not sparse and you really, really do need the whole blob in memory, you will probably have to use a two-dimensional structure, e.g. with a Map matching offsets modulo 1024 to the proper 1024-byte array. This approach might be be more memory efficient even for sparse arrays, since adjacent filled cells can share the same Map entry.

Robert
  • 39,162
  • 17
  • 99
  • 152
thkala
  • 84,049
  • 23
  • 157
  • 201
7

A byte[] with size of the maximum 32-bit signed integer would require 2GB of contiguous address space. You shouldn't try to create such an array. Otherwise, if the size is not really that large (and it's just a larger type), you could safely cast it to an int and use it to create the array.

Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 1
    The original questioner presumably is not using a 32-bit JVM. An array of int[] with 2^32 bytes is constructible... – Tom Hawtin - tackline Jul 01 '09 at 23:52
  • actually the max is 31-bit integer since java's types are signed. So 2 gigs roughly. – jbu Jul 01 '09 at 23:53
  • 1
    jbu: Oops. You're right. Obviously, it's also available in a 64 bit process but I meant to say it's too large and if you are really creating such a large array, you're most probably going the wrong way. – Mehrdad Afshari Jul 01 '09 at 23:56
  • mehrdad: I don't know if *I'm* going the wrong way...again it's a message type that I'm handling that can be that big (theoretically). It seems that the one going the wrong way is the guy who create this message type. I do not know whether or not he uses the full size of his message, but I'd feel like I'd like to support his message and not throw away bytes (even if he using them). – jbu Jul 02 '09 at 00:08
  • 1
    If you really expect the message to be that large, you should use some kind of buffering mechanism so that you don't load the whole thing at once into memory. I just tried creating an array of 2^30 bytes (Integer.MAX_VALUE/2) in a 64 bit JVM and it throws OutOfMemoryError. – Mehrdad Afshari Jul 02 '09 at 00:11
  • yes i believe it's a ... stack overflow :) I guess I need to throw away bytes then or ask this guy if he intends to use all those bytes. – jbu Jul 02 '09 at 00:13
  • jbu: Actually, it's not created on stack. It's lack of enough Java heap space. I could create Integer.MAX_VALUE/4 bytes on 64 bit and much less (nowhere near that) in 32 bit. You should really think about buffering if you expect the message to be larger than a couple hundred megabytes. – Mehrdad Afshari Jul 02 '09 at 00:15
  • As someone who has a computer with 32GB of memory, I don't see allocating 2GB of contiguous memory a problem.... Eventually this is going to be an artificial limitation that needs to change. – Jason Sep 15 '13 at 13:51
1

You should probably be using a stream to read your data in and another to write it out. If you are gong to need access to data later on in the file, save it. If you need access to something you haven't ran into yet, you need a two-pass system where you run through once and store the "stuff you'll need for the second pass, then run through again".

Compilers work this way.

The only case for loading in the entire array at once is if you have to repeatedly randomly access many locations throughout the array. If this is the case, I suggest you load it into multiple byte arrays all stored in a single container class.

The container class would have an array of byte arrays, but from outside all the accesses would seem contiguous. You would just ask for byte 49874329128714391837 and your class would divide your Long by the size of each byte array to calculate which array to access, then use the remainder to determine the byte.

It could also have methods to store and retrieve "Chunks" that could span byte-array boundaries that would require creating a temporary copy--but the cost of creating a few temporary arrays would be more than made up for by the fact that you don't have a locked 2gb space allocated which I think could just destroy your performance.

Edit: ps. If you really need the random access and can't use streams then implementing a containing class is a Very Good Idea. It will let you change the implementation on the fly from a single byte array to a group of byte arrays to a file-based system without any change to the rest of your code.

Bill K
  • 62,186
  • 18
  • 105
  • 157
  • 1
    I doubt even in that case you could allocate such amount of memory. Let `long` alone, the second line throws exception on a 64 bit JRE on my machine: "byte[] a1 = new byte[Integer.MAX_VALUE/4]; byte[] a2 = new byte[Integer.MAX_VALUE/4];" He would have to use some kind of in memory buffer if he's dealing with such a large amount of data. – Mehrdad Afshari Jul 02 '09 at 00:18
  • That's why I suggested a small class that could be used to change the implementation on the fly. Of course, streaming should be used if at all possible (and it absolutely should be possible!) but if not, it might be possible to use some kind of caching algorithm with smaller blocks held by soft references. – Bill K Jul 02 '09 at 16:27
1

It's not of immediate help but creating arrays with larger sizes (via longs) is a proposed language change for Java 7. Check out the Project Coin proposals for more info

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • 1
    References: https://blogs.oracle.com/darcy/entry/project_coin_consideration_round_2 ; http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000869.html – Nayuki Jan 19 '16 at 21:16
0

One way to "store" the array is to write it to a file and then access it (if you need to access it like an array) using a RandomAccessFile. The api for that file uses long as an index into file instead of int. It will be slower, but much less hard on the memory.

This is when you can't extract what you need during the initial input scan.

Kathy Van Stone
  • 25,531
  • 3
  • 32
  • 40