0

FastByteArrayOutputStream has a write function which has addBuffer which accepts minCapacity in integer and allocates next block size as next power of 2 of minCapacity. So, block size keeps increasing in order to accomodate the file in buffer.

I have a file greater than max size, (Internally diving it to 3 files, pushing them to outputstream an finally creating it a file in azure storage) so while writing it to buffer the minCapacity goes over max integer value of 2147483647 and starts assigning block size in signed integer -2147483648, which is invalid and gives exception as in the image attached.

  • If you need to store a file that big in memory, it's probably time to rethink your program's structure. – 0x150 Apr 25 '23 at 06:55
  • I am not storing file in memory. I am adding 3 files to one output stream which is creating/appending it to the final file in azure storage account. But it overflows in between to file is never creates the file. –  Apr 25 '23 at 07:49
  • Please update the question to explain what you are doing. Edit the question. – Stephen C Apr 25 '23 at 07:51
  • 1
    "I am not storing file in memory. " Yes you are. Otherwise you wouldn't need a byte array output stream at all. And you can't describe using more than 2^31 bytes as 'memory efficient'. There is never any reason for buffers this size. – user207421 Apr 25 '23 at 07:52
  • If you want to send three files to one outputstream, then you can open a file, use its `transferTo` method to send it to the outputstream, then open the next file, send it to the outputstream and then the last file and send it to the outputstream. Unless there is something you're not telling us, you really don't need to load all three files into memory to transfer them to an outputstream. – Mark Rotteveel Apr 25 '23 at 09:09
  • @StephenC I have updated the question but it doesn't really affect the inital question. Purpose of this is to pass a file to output stream which creates block size greater than Integer max. –  Apr 25 '23 at 09:25
  • No, that's not the purpose. The purpose is just the part about 'pass a file to output stream'. The part about 'which creates block size greater than Integer max' is just your misconceived idea about what a solution would look like. Lose it and your problem disappears. – user207421 Apr 25 '23 at 09:49
  • @user207421 Can you please suggest a solution? I might be wrong in explaining but I'm trying my best here. Like my comment in the answer below : "I am reading 3 sub files created from a larger file and writing it to a file in azure storage.To write a file to a file in azure storage account, I don't know what other than Output Stream I can use. I was also wondering if there is anything else which uses Long as input to create block size. ". If I should lose the idea of passing it to outputstream, Can you suggest what else can I use in this senario –  Apr 25 '23 at 10:00
  • So your question is about how to write to a file in Azure storage. Not about huge byte array output streams. What output stream are you *presently* using? and what makes you think you need a different one? and what does 'us[ing] `Long` as input to create block size' have to do with anything? All you have to do is adopt the `transferTo()` suggestion above with the successive input streams. I don't know why you haven't tried it already. You need to exhibit some *relevant* code here. – user207421 Apr 25 '23 at 10:03
  • The classes mentioned in this question don't appear to be part of the standard java API. Are they Azure, Spring? Please update the tags to reflect the third party platform. – WJS Jul 21 '23 at 13:46

1 Answers1

1

FastByteArrayOutputStream will not work for your use-case. While it uses a Deque<byte[]> internally, that is just an optimization to reduce the amount of copying. If you look at the source code, you will see that there are a number of places that limit the size to the maximum size of a byte[] ... which is 2^31 - 1 bytes; i.e. 2GB - 1.

I've got a file greater than max size ...

Possible solutions:

  1. If you are outputting the file, write the data directly to the file or socket. There is no obvious reason to write to use a ByteArrayOutputStream variant for this. It won't improve performance!

  2. Take the source code for FastByteArrayOutputStream and modify it for your own purposes. However, you will run into the problem that the getByteArray and unsafeGetByteArray methods are unimplementable for 2GB and larger content. And similar issues.

There may be other solutions, but it is hard to say. You don't explain what you are actually doing.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I am reading 3 sub files created from a larger file and writing it to a file in azure storage.To write a file to a file in azure storage account, I don't know what other than Output Stream I can use. I was also wondering if there is anything else which uses Long as input to create block size. –  Apr 25 '23 at 09:55
  • 1
    That wouldn't work in Java. Arrays are limited by the language to 2^31 - 1 elements. – Stephen C Apr 25 '23 at 10:11
  • What kind of Azure storage? Blob? Queue? File? DataLake? – Stephen C Apr 25 '23 at 10:17
  • from blob to file –  Apr 25 '23 at 10:30
  • Well you should probably be opening a `BlobOutputStream`. Take a look at https://stackoverflow.com/questions/40748705 for some example code. – Stephen C Apr 25 '23 at 10:55
  • Alternatively, you should be able to use a Java SE `PipesInputStream` / `PipedOutputStream` pair. – Stephen C Apr 25 '23 at 10:57
  • I agree about the blob, but the piped streams don't really get you anywhere at all here, just another pointless thread and another pointless set of I/O operations. They're just a toy really. I used them once in about 1997, never again. – user207421 Apr 26 '23 at 08:15