9

I am trying to find a solution/workaround for slicing extremely large arrays without creating new copies. Here is my problem.

Suppose I have a large array of double/int of size 100 million or more. I am storing many different arrays representing different things in a single extremely large array to significantly save on memory usage. Hence, instead of having 1 million arrays each of size 100, I have a single array of size 100 million. I store indices (start and stop) to keep track of my data.

I want to get thousands of slices with size 100. If I use the method Arrays.copyOfRange() to get slices, it defeats the purpose of putting everything in a single large array since each slice is a new copy eating up memory.

I have legacy code (in excess of 1 million lines written over the years by many people) that works with its own data (which are smaller arrays). It is not possible to modify the existing code to work with indices (begin, end) in a large array.

If I could somehow return the original array such that the returned array is a reference (or pretends to be) where index 0 is some arbitrary index in the original large array, it would be great.

In C/C++, I can easily return a pointer with a specific offset and length with which the calling code can work.

What are my options in Java?

Edit: I looked at the following similar question, but it does not contain a response to my question. How to get a sub array of array in Java, without copying data?

Community
  • 1
  • 1
Santosh Tiwari
  • 1,167
  • 2
  • 14
  • 26
  • 3
    It's not possible to *slice* an array in Java. – Luiggi Mendoza Mar 22 '13 at 19:36
  • For curiosity, why can't you modify the existing code? – Christoffer Hammarström Mar 22 '13 at 19:37
  • I did not write it. It will take huge amount of time to modify legacy code (few months to few years). It may introduce difficult to find errors. etc. – Santosh Tiwari Mar 22 '13 at 19:39
  • 1
    The only structures I know that can give you a slice of them are `TreeSet` and `TreeMap`, but I'm not sure if they apply to your problem. – Luiggi Mendoza Mar 22 '13 at 19:42
  • 3
    "I am storing many different arrays representing different things in a single extremely large array to significantly save on memory usage" -- how much memory do you think this saves you? – parsifal Mar 22 '13 at 19:55
  • Significant savings (memory usage reduced by about 50%). Arrays in Java are objects where header takes 12 bytes + additional 8 bytes for storing the reference. etc. Having 1 million arrays incurs huge overhead. – Santosh Tiwari Mar 22 '13 at 19:58
  • 1
    @SantoshTiwari - The 8 bytes for storing the reference is a wash, because you'd need to have a reference for each slice anyway. It seems like 12MB (12 byte header for 1 million arrays) is not a huge overhead when you're talking about 400MB of data (1 million arrays x 100 elements x 4 bytes/element). If you're seeing memory reduction of 50%, something else is going on. – Ted Hopp Mar 22 '13 at 20:00
  • Well, I did not tell you the complete story. But, having a single large array instead of a huge object graph with same number of real numbers is making a significant difference. I am using Eclipse MAT to measure the retained size. – Santosh Tiwari Mar 22 '13 at 20:12
  • 1
    Well, considering that Java does not allow pointers to arbitrary blocks of memory like C, and you can't change this legacy code to use your own data structures, it seems like your only choice is to buy more memory. – parsifal Mar 22 '13 at 20:25
  • Yeah, I am now thinking of a middle ground (have several multi-dimensional arrays like double[][][]) which might be helpful. I can still return arrays without creating new copies. – Santosh Tiwari Mar 22 '13 at 20:29
  • 1
    Is the 50% savings perhaps coming from keeping a reference to the original array as well as to the million small slices? Once you have the slices, you don't need the original and you should be sure that all references to it either go out of scope or are set to `null`. – Ted Hopp Mar 22 '13 at 21:01
  • Multidimensional arrays does not help either. – Santosh Tiwari Mar 22 '13 at 21:48
  • possible duplicate of [Grab a segment of an array in Java without creating a new array on heap](http://stackoverflow.com/questions/1100371/grab-a-segment-of-an-array-in-java-without-creating-a-new-array-on-heap) – Ciro Santilli OurBigBook.com Mar 27 '15 at 08:19

4 Answers4

3

For an array of int values, you can wrap in an IntBuffer. You can also wrap a slice of an array.

int[] largeArray = . . .

// create a slice containing the elements 100 through 149 (50 elements):
IntBuffer slice = IntBuffer.wrap(largeArray, 100, 50);
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • The slice would create a new copy of the range that I need (IntBuffer.get()) so it would defeat the purpose. – Santosh Tiwari Mar 22 '13 at 19:42
  • @SantoshTiwari - I think you commented on an obsolete version of my answer. When you wrap a part of an array as shown, no data are copied. – Ted Hopp Mar 22 '13 at 19:43
  • The IntBuffer.wrap() method will return an IntBuffer object which cannot be passed to existing code. If I could wrap the returned buffer into an int[], the approach might work. Thanks. – Santosh Tiwari Mar 22 '13 at 19:43
  • OK, there is an array() method that can return an array backed by the original buffer. It might work. I will study this option. Thanks. – Santosh Tiwari Mar 22 '13 at 19:46
  • 1
    @SantoshTiwari - The `array()` method returns the entire backing array, not the slice. You'd have to use `get(int[] dest)` to retrieve an array that is just the slice. In Java, an `int[]` cannot be an alias for part of another `int[]`. – Ted Hopp Mar 22 '13 at 19:48
  • Yup. I already tried that. IntBuffer does not help in my case. – Santosh Tiwari Mar 22 '13 at 19:55
2

How about creating a wrapper class that holds references to your original array and your start index, and using an instance of this wrapper to access your original array.

Below code might not be syntactically correct, but it should give you the idea.

public class ArraySlice(){
  private int startIndex;
  private int[] originalArray;
  //getters-setters

  public ArraySlice(int[] originalArray, int startIndex){
    //Initialize
  }

  public int get(int index){
    return originalArray[startIndex+index]
  }
}
Erkan Haspulat
  • 12,032
  • 6
  • 39
  • 45
1

Your best option is to store the indexes of the slices in a separate structure, such as an array storing those indexes.

This way, you do not instantiate large arrays being a partition of the whole data array.

Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
1

Can you create your own object that stores index, size and reference to the original array?

class CustomizedArray {
  int startIndex;
  int size;
  int[] originalArray;

  public CustomizedArray(int startIndex, int size, int[] originalArray) {
    this.startIndex = startIndex;
    this.size = size;
    this.originalArray = originalArray;
   }

   public int getIndex(int index) {
     int originalIndex = startIndex+index;
     if(index <0 || originalIndex >= startIndex+size) {
        throw new IndexOutOfBoundException();
     }
     return originalArray[originalIndex];


}

Then you can store CustomizedArray in some bigger structure.

zubergu
  • 3,646
  • 3
  • 25
  • 38