4

I'm dealing with huge arrays in my application and need to resize them.

Let's say you have an array of 2Gb and you want to resize it to 3Gb. Is there a way to resize it without needing temporarily 5Gb?

For instance, given a 1Gb heap using the -Xmx1G flag:

public class Biggy {
    public static void main(String[] args) {
        int[] array;

        array = new int[100 * 1000 * 1000]; // needs 400Mb, works
        array = null; // needed for GC
        array = new int[150 * 1000 * 1000]; // needs 600Mb, works
        array = null; // needed for GC
        array = new int[100 * 1000 * 1000]; // needs 400Mb, works
        array = Arrays.copyOf(array, 150 * 1000 * 1000); // needs 1000Mb, throws out of memory
    }
}

So, is there a way to resize arrays without requiring that extra temporary memory?

dagnelies
  • 5,203
  • 5
  • 38
  • 56

2 Answers2

3

I would use a List<int[]> where each int[] is a fixed size. e.g. 128 million. To grow the whole "collection" only involves added another array. I use IntBuffer in direct memory which avoids the need to use heap. (or use memory mapped files which means it doesn't use heap or direct memory ;) This is ugly, and I use a wrapper class to hide the ugliness. It does perform pretty nicely. With memory mapped files I can use an "array" which is larger than the physical memory.

private final List<IntBuffer> array = new ArrayList<IntBuffer>();

public int get(long n) {
    return array.get((int)(n >> 27)).get(n & ((1 << 27) -1));
}

public void put(long n, int v) {
    return array.get((int)(n >> 27)).put(n & ((1 << 27) -1), v);
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    You can't memory-map pieces larger than two GiB, unless resorting to native code (unless they fixed that by now). So that would still need multiple pieces for very large arrays. – Joey Sep 04 '11 at 10:32
  • Yes, I would still use fixed sized IntBuffer which are individually memory mapped to different files. This reduces the cost of growing the whole "collection" – Peter Lawrey Sep 04 '11 at 10:36
  • @Joey: Indeed ...until now I used arrays of around 1 Gb ...thanks for pointing at this – dagnelies Sep 04 '11 at 10:38
  • I admit that "the way to go" is splitting the big array into smaller ones ...but oh boy, I'm already afraid of the refactoring because there is a lot going on with this array, and even a simple `System.arraycopy` used internally to move chunks have to be changed. – dagnelies Sep 04 '11 at 10:44
  • Also, what is the benefit of using IntBuffer instead of int[]? – dagnelies Sep 04 '11 at 10:55
  • @PeterLawrey is it necessary to use different files? I could just map to different areas in one file (?) – Karussell Dec 23 '12 at 00:19
  • @Karussell You are right that you can map different parts of the same file multiple times. I do thin in a library I wrote https://github.com/peter-lawrey/Java-Chronicle – Peter Lawrey Dec 23 '12 at 22:55
  • @PeterLawrey yes of course I know your lib already :) (what is the license?) btw: do you know what the recommended strategy is if I need to increase the file size? full remap of all buffers (cleanup is a bit hacky) or new buffer(s) for new space (then the old buffers could point to invalid space, right?) http://stackoverflow.com/q/14011919/194609 – Karussell Dec 23 '12 at 22:59
  • @Karussell As Java doesn't really unmap previous mappings, I would suggest only adding esp as each one is limited to Integer.MAX_VALUE in size. What I do is grow by a large chunk like 128 MB as this has little impact on Linux (which is what I typically use) – Peter Lawrey Dec 23 '12 at 23:10
  • @PeterLawrey well, I thought that too. But the problem is that changing the file size could cause problems to mapped buffers (probably the file could be moved without the mapping being updated?). I couldn't find a document telling me that increasing the file size is ok (for e.g. linux) – Karussell Dec 24 '12 at 21:03
  • @Karussell I don't see why increasing the size would move the file or why moving the file would be a problem. The OS is responsible for mapping the pages of the file from disk to memory. You are not mapping a region of the disk, you are mapping a region of a file, wherever it is. How it does it shouldn't impact you too much unless it's heavily fragmented (i.e. for performance) – Peter Lawrey Dec 24 '12 at 22:25
0

I can't figure out a solution to your problem using arrays, if you want to create a new bigger array you have to keep the old one in memory until the data is copied to the new one, so you must have both in memory even if for a short amount of time. (Correct me if I'm wrong)

I wonder if you need to grow your int collection continuously, why don't you use a List<int> instead of an int[] ?

Gigab0rt
  • 302
  • 4
  • 14