2

I am trying to generate a set of 2D int arrays that should (at least) be 6x6. Each array stores values from 0-6. I tried using a simple HashSet<int[][]> to store them (with 512MB of memory), and I quickly got the error

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

only a short way into the program.

The options that I've thought up for storing the arrays:

  1. Store them as a long in base 7. This would only work for up to 24 (24.3717) digits, since long cannot be more than 2^63 bits.

  2. Store them as a String (e.g. {{0, 0, 0, 1}, {3, 6, 2, 0}} would become "00013620"). This would only take up 4x less space (I think), because a char is still 1 byte.

  3. Use something like BitSet or BigInteger? I have no idea what each is or how they work.

So my question is: What is the smallest way to store a 6 x 6 array of values from 0 - 6? Do the options above work, or is there an easier way?


Note: I have 8GB of memory that I can use, if it becomes necessary.

My code (it has to do with chess, if you must know): n is the size of the array (width and height), should be able to go up to (or past) 6.

public static HashSet<int[][]> getBoards(int[][] data, int zero, int num) {
    HashSet<int[][]> ret = new HashSet<int[][]>(0);

    if (zero == num) {
        ret.add(data);
    } else if (zero == 0) {
        for (int y = 0; y < n; y++) {
            for (int x = 0; x < n; x++) {
                for (int i = 1; i < 7; i++) {
                    int[][] d0 = new int[n][n];
                    d0[y][x] = i;
                    ret.addAll(getBoards(d0, 1, num));
                }
            }
        }
    } else {
        for (int y = 0; y < n; y++) {
            for (int x = 0; x < n; x++) {
                if (data[y][x] == 0) continue;

                HashSet<int[]> moves = getMoves(data[y][x], x, y);

                while (moves.iterator().hasNext()) {
                    int[] m = moves.iterator().next();

                    for (int i = 0; i < 6; i++) {
                        int[][] d0 = arrayCopy(data);
                        d0[m[0]][m[1]] = i;

                        ret.addAll(getBoards(d0, zero + 1, num));
                    }
                }
            }
        }
    }

    return ret;
}

public static HashSet<int[]> getMoves(int piece, int xPos, int yPos) {
    HashSet<int[]> ret = new HashSet<int[]>(0);

    for (int y = 0; y < n; y++) {
        for (int x = 0; x < n; x++) {
            if (x == xPos && y == yPos) continue;

            switch (piece) {
            case 1:
                if (y - yPos == 1 && Math.abs(x - xPos) == 1) ret.add(new int[] {y, x});
                break;
            case 2:
                if (Math.abs(y - yPos) + Math.abs(x - xPos) == 3 && x != xPos && y != yPos) ret.add(new int[] {y, x});
                break;
            case 3:
                if (Math.abs(y - yPos) == Math.abs(x - xPos)) ret.add(new int[] {y, x});
                break;
            case 4:
                if (y == yPos || x == xPos) ret.add(new int[] {y, x});
                break;
            case 5:
                if (Math.abs(y - yPos) == Math.abs(x - xPos) || y == yPos || x == xPos) ret.add(new int[] {y, x});
                break;
            case 6:
                if (Math.abs(y - yPos) <= 1 && Math.abs(x - xPos) <= 1) ret.add(new int[] {y, x});
                break;
            default:
                throw new IllegalArgumentException("Unknown Piece Number (" + piece + ")");
            }
        }
    }

    return ret;
}

The full error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at ChessGenerator.arrayCopy(ChessGenerator.java:120)
at ChessGenerator.getBoards(ChessGenerator.java:71)
at ChessGenerator.getBoards(ChessGenerator.java:74)
at ChessGenerator.getBoards(ChessGenerator.java:74)
at ChessGenerator.getBoards(ChessGenerator.java:74)
at ChessGenerator.getBoards(ChessGenerator.java:74)
at ChessGenerator.getBoards(ChessGenerator.java:74)
at ChessGenerator.getBoards(ChessGenerator.java:74)
at ChessGenerator.getBoards(ChessGenerator.java:56)
at ChessGenerator.main(ChessGenerator.java:23)

EDIT: As @Louis pointed out, my use of HashSets was causing the above error, however, I still run out of memory

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at ChessGenerator.arrayCopy(ChessGenerator.java:119)
at ChessGenerator.getBoards(ChessGenerator.java:70)
at ChessGenerator.getBoards(ChessGenerator.java:73)
at ChessGenerator.getBoards(ChessGenerator.java:73)
at ChessGenerator.getBoards(ChessGenerator.java:73)
at ChessGenerator.getBoards(ChessGenerator.java:73)
at ChessGenerator.getBoards(ChessGenerator.java:73)
at ChessGenerator.getBoards(ChessGenerator.java:73)
at ChessGenerator.getBoards(ChessGenerator.java:58)
at ChessGenerator.main(ChessGenerator.java:23)
ricky3350
  • 1,710
  • 3
  • 20
  • 35
  • @David Not necessarily – ricky3350 May 05 '15 at 22:04
  • I can't even begin to imagine why storing 6x6 arrays of int values would result in out of memory. Can you show some code? – Aify May 05 '15 at 22:05
  • `HashSet` I don't think you want a `HashSet` for array's `equals()` and `hashCode()`. Arrays in java are implementing `hashCode()` and `equals()` to match object identity. Specifically, two objects [1,2,3] and [1,2,3] will NOT be equal, and will (probably) have different hashCode – amit May 05 '15 at 22:05
  • You should read [this question](http://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded). It actually seems like you not exactly running out of memory, so making the objects smaller may not help. – David says Reinstate Monica May 05 '15 at 22:10
  • @David Why would that be happening, and how would I fix it? – ricky3350 May 05 '15 at 22:12
  • @ricky3350 I can't tell you anything other than whats in that question. I'm not familiar with this issue. – David says Reinstate Monica May 05 '15 at 22:14
  • @amit Right now, I'm more worried about generating the list of arrays rather than having it be unique. – ricky3350 May 05 '15 at 22:22
  • @David Thank you, as far as I can tell that was the problem. The solution was to use ArrayList instead of HashSet. – ricky3350 May 05 '15 at 22:39

3 Answers3

1

If you were expecting the HashSet to keep only unique int[][]s, and eliminate duplicates, that's not going to work -- the equals and hashCode implementation for int[][] (and all arrays) is identity-based. If you had been depending on uniqueness to keep the number of distinct arrays small, that's not going to work; you're going to have to wrap them in a type implementing a correct hashCode and equals.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • But is the HashSet or similar class the smallest way? – ricky3350 May 05 '15 at 22:09
  • 1
    First you have to make sure your code is actually doing what you want, then you can worry about that. Right now, I'm almost positive it's not doing what you mean it to. `HashSet` incurs a decent amount of overhead per element, but if you need uniqueness, you need uniqueness, and if you want to keep things unique relative to the contents of the `int[][]`, you _need_ a wrapper type. – Louis Wasserman May 05 '15 at 22:10
  • Right now, I'm more worried about generating the list of arrays rather than having it be unique. – ricky3350 May 05 '15 at 22:13
  • In that case, an `ArrayList` would consume somewhat less memory. Also, if it helps, one-dimensional arrays could take significantly less memory than two-dimensional arrays, if you know the number of rows in advance. – Louis Wasserman May 05 '15 at 22:14
  • Thank you, I used `ArrayList` and the program runs perfectly. Can you explain to me your first comment? I would like them to be unique. – ricky3350 May 05 '15 at 22:38
  • HashSet won't enforce uniqueness on arrays because the definition of equals and hashCode for arrays considers two different array objects, even if they have the same contents, to be different. You would have to wrap them in a type that defines its equals and hashCode in terms of the array contents to uniquify them. – Louis Wasserman May 05 '15 at 22:41
  • That's not the issue here, the issue is the trillions of arrays being generated. – amit May 05 '15 at 22:42
  • Can you give me an example of such a type? – ricky3350 May 05 '15 at 22:42
  • You'd essentially have to write your own. Something like `class TwoDIntArray { private final int[][] array; TwoDIntArray(int[][] array) { this.array = array; } public int hashCode() { return Arrays.deepHashCode(array); } public boolean equals(Object o) { return o instanceof TwoDIntArray && Arrays.deepEquals(array, ((TwoDIntArray) o).array); } }` – Louis Wasserman May 05 '15 at 22:44
1

You seem to be creating a LOT of boards, it is hard to follow, but it seems you are basically generating a large portion of all arrays of size 6X6 where each cell can has any value of 1,2,..,6.

The number of such arrays is 6^36 ~= 10^28.

This means, even if each array will be only one byte (it cannot be), you are still going to need 10^16 TB to hold all of them.

I suggest you look for an alternative that does not include generating all possible arrays explicitly.


As a side note, that lowest possible number of bits to represent your object is ceil(log_2(6^36)) = 94, but that's going to be a lot of work to get this optimal result, and I won't advise it.

amit
  • 175,853
  • 27
  • 231
  • 333
  • It doesn't (shouldn't) create the list of _all_ of the boards. Even if it does, would it work for 5x5? 4x4? It definitely works for 3x3. – ricky3350 May 05 '15 at 22:47
  • @ricky3350 Point is - you are generating WAY too many arrays, you need to think of a way to acieve what you want without generating all of them explicitly. – amit May 05 '15 at 22:49
1

The most straightforward yet still memory-efficient way is to store each array as two longs, with each field taking up 3 bits (that's 3*36=108 useful bits in total, with an overhead of 20 unused bits). Although the theoretical limit is less than that, you'd almost certainly want your structures aligned to word boundaries, therefore you're not really losing anything. What you win though is that accessing individual fields is simple and fast, only requiring bit-masking and shifting operations.

I would also take a look at off-heap storage options, to eliminate all object overhead.

biziclop
  • 48,926
  • 12
  • 77
  • 104