12

In java, an EnumSet stores the items it contains in a bitmask / bit vector using a long (RegularEnumSet) or long[] (JumboEnumSet). I have now come across a use case where I have many thousand domain Objects (let's call them Node), each of which will show all items of an enum (let's call that Flag) in an order that will vary per Object.

Currently I am storing the Order as Guava ImmutableSet, because that guarantees to retain insertion order. However, I have used the methods explained on this page to compare memory usage in an EnumSet<Flag>, an ImmutableSet<Flag> and a Flag[]. Here are the results when a) Flag has 64 enum items and b) all three variants contain all 64 items:

EnumSet: 32 bytes
ImmutableSet: 832 bytes
Array: 272 bytes

So my question is: is there a clever way to pack the enum ordering into a numeric value to get a memory footprint smaller than that of the array? If it makes a difference: in my use case I would assume that the ordering always contains all Enum items.

To clarify: my enum is much smaller than that and I don't have any memory problems as of now, nor is it likely that this situation will ever give me memory problems. It's just that this inefficiency bugs me, even on this microscopic level.

Update:

After suggestions from the various answers and comments I came up with this data structure that uses a byte array. Caveat: It doesn't implement the Set interface (doesn't check for unique values) and it won't scale to large enums beyond what a byte can hold. Also, the complexity is pretty awful, because Enum.values() has to be queried repeatedly (see here for a discussion of this problem), but here goes:

public class EnumOrdering<E extends Enum<E>> implements Iterable<E> {
    private final Class<E> type;
    private final byte[] order;

    public EnumOrdering(final Class<E> type, final Collection<E> order) {
        this.type = type;

        this.order = new byte[order.size()];

        int offset = 0;
        for (final E item : order) {
            this.order[offset++] = (byte) item.ordinal();
        }

    }

    @Override
    public Iterator<E> iterator() {
        return new AbstractIterator<E>() {
            private int offset = -1;
            private final E[] enumConstants = type.getEnumConstants();

            @Override
            protected E computeNext() {
                if (offset < order.length - 1) {
                    return enumConstants[order[++offset]];
                }
                return endOfData();
            }
        };
    }
}

The memory footprint is:

EnumOrdering:104

That's a pretty good result so far, thanks to bestsss and JB Nizet!

Update: I have changed the code to only implement Iterable, because anything else would require sensible implementations for equals / hashCode / contains etc.

sebkur
  • 658
  • 2
  • 9
  • 18
Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
  • simple array of byte[] will do, byte[] contains the enum.ordinal. if you have more than 256 items you can use short[]/int[]. Alternatively you can pack the items into less than 8bits. You may have to take extra care of serialization, either way the code will be less than 200lines and it's quite trivial. – bestsss Jun 27 '12 at 14:04
  • if you do not need the insertion order,just use a single long - can contain up to enum w/ 64 elements, just like it's done in C. – bestsss Jun 27 '12 at 14:06
  • @bestsss if I didn't need the insertion order I'd use an EnumSet, which does exactly that – Sean Patrick Floyd Jun 27 '12 at 14:07
  • then use the `byte[]` to denote the adding order and one `long` for fast contains (i.e. no need to iterate), after you make the set immutable trim the `byte[]` to size. So a Set of 64items will have 64+8+2*object_header(~40) total memory footprint – bestsss Jun 27 '12 at 14:13
  • 2
    On the edit: you can 'cache' the `values()`, instead the Class `type` use the values array to obtain the class, at least won't need to create 'em on each iterator. Then go further and create static `WeakHashMap>`, WeakHashMap sucks a bit but it will do here. So you almost got similar stuff like SharedSecrets – bestsss Jun 27 '12 at 14:51
  • @bestsss yeah, I'd probably go with a Guava cache with weak keys, but as I said, this question is mostly out of academic interest – Sean Patrick Floyd Jun 27 '12 at 14:55
  • *Guava cache with weak keys* -- and it will leak (classloaders, that's it). The value type must be `SoftReference`, not just `Enum[]`. That's one of the very bad type of leak designs and it's very often in practice. – bestsss Jun 27 '12 at 14:58
  • Good lord, and there I thought everything was simple :-) – Sean Patrick Floyd Jun 27 '12 at 15:03
  • @Sean Patrick Floyd You do already a sort of "caching" as you store the enum values in the Iterator. Storing them in your EnumOrdering instead will make the class a bit more memory demanding and a bit faster. For real caching `CacheBuilder.weakKeys().softValues().build(new CacheLoader() {...})` should work nicely. – maaartinus Jun 27 '12 at 18:46

2 Answers2

6

is there a clever way to pack the enum ordering into a numeric value

Yes, you can represent an ordering as a numeric value, although to use it you need to convert back to a byte/int array. And since there are 64! possible orderings of 64 values, and 64! is bigger than Long.MAX_VALUE, you'd need to store the number in a BigInteger. I guess this would be the most memory-efficient way of storing the ordering, although what you gain in memory you lose in time due to having to convert the number to an array.

For algorithms to convert between number/array representations, see this question.

Here's an alternative to the above, don't know if it's as efficient as on that one, and you'll have to convert the code from int to BigInteger-based, but it should be enough to give you the idea:

/**
   * Returns ith permutation of the n numbers [from, ..., to]
   * (Note that n == to - from + 1).
   * permutations are numbered from 0 to n!-1, if i is outside this
   * range it is treated as i%n! 
   * @param i
   * @param from
   * @param n
   * @return
   */
  public static int[] perm(long i, int from, int to)
  {
    // method specification numbers permutations from 0 to n!-1.
    // If you wanted them numbered from 1 to n!, uncomment this line.
    //  i -= 1;
    int n = to - from + 1;

    int[] initArr  = new int[n];             // numbers [from, ..., to]
    int[] finalArr = new int[n];             // permutation of numbers [from, ..., to]

    // populate initial array
    for (int k=0; k<n; k++)
      initArr[k] = k+from;

    // compute return array, element by element
    for (int k=0; k<n; k++) {
      int index = (int) ((i%factorial(n-k)) / factorial(n-k-1));

      // find the index_th element from the initial array, and
      // "remove" it by setting its value to -1
      int m = convertIndex(initArr, index);
      finalArr[k] = initArr[m];
      initArr[m] = -1;
    }

    return finalArr;
  }


  /** 
   * Helper method used by perm.
   * Find the index of the index_th element of arr, when values equal to -1 are skipped.
   * e.g. if arr = [20, 18, -1, 19], then convertIndex(arr, 2) returns 3.
   */
  private static int convertIndex(int[] arr, int index)
  {
    int m=-1;
    while (index>=0) {
      m++;
      if (arr[m] != -1)
        index--;
    }

    return m;
  }

Basically you start with your init array in its natural ordering, then loop over your final array, each time calculating which of the remaining elements should be placed next. This version "deletes" elements from the init array by setting the value to -1. It would probably be more intuitive to use a List or LinkedList, I've just pasted this from some old code I had lying around.

With the above methods and with this as main:

public static void main(String[] args) {
    int n = (int) factorial(4);
    for ( int i = 0; i < n; i++ ) {
      System.out.format( "%d: %s\n", i, Arrays.toString( perm(i, 1, 4 ) ) );
    }
}

You get the following output:

0: [1, 2, 3, 4]
1: [1, 2, 4, 3]
2: [1, 3, 2, 4]
3: [1, 3, 4, 2]
4: [1, 4, 2, 3]
5: [1, 4, 3, 2]
6: [2, 1, 3, 4]
7: [2, 1, 4, 3]
8: [2, 3, 1, 4]
9: [2, 3, 4, 1]
10: [2, 4, 1, 3]
11: [2, 4, 3, 1]
12: [3, 1, 2, 4]
13: [3, 1, 4, 2]
14: [3, 2, 1, 4]
15: [3, 2, 4, 1]
16: [3, 4, 1, 2]
17: [3, 4, 2, 1]
18: [4, 1, 2, 3]
19: [4, 1, 3, 2]
20: [4, 2, 1, 3]
21: [4, 2, 3, 1]
22: [4, 3, 1, 2]
23: [4, 3, 2, 1]

Here is an executable version on ideone.

Judging by BigInteger.bitLength(), it should be possible to store an ordering of 64 elements in no more than 37 bytes (plus the overhead of using a BigInteger instance). I don't know if it's worth the trouble, but it makes a nice exercise!

Community
  • 1
  • 1
OpenSauce
  • 8,533
  • 1
  • 24
  • 29
  • Nice answer, although I'd prefer it if you provided a few lines of sample code for the conversion (the answer you link to is beyond my comprehension, I'm afraid) – Sean Patrick Floyd Jun 28 '12 at 08:19
  • @SeanPatrickFloyd: OK, I dug around and found an example in one of my old projects, have updated the answer. Looking at that linked answer again, it's actually not the same - it uses a different representation. – OpenSauce Jun 28 '12 at 09:38
2

If you have 64 enum values, you could use a byte array where each byte would contain the ordinal of one of the enum items. This would need 3 * (64 + 16) = 240 bytes for 3 arrays of 64 bytes (16 bytes being the cost of a byte array, regardless of its length).

This still wastes space, as each byte is able to store 8 bits, but you only need 6 to store numbers from 0 to 63. So you could apply a clever packing algorithm that would use 3 bytes (24 bits) to store 4 enum ordinals. This would lead to 3 * (64 * 3 / 4 + 16) = 192 bytes.

I suck at byte manipulations, so I'll leave the implementation as an exercise for you.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • it still takes some extra 8 bytes to perform contains (or you have to scan the byte[] each time). Basically I offered packing the bytes in the 1st comment but rarely it is worthy the effort for so tiny amounts of data. There could be cleverer schemes to pack like deltas between the added elements with variable bit-length. – bestsss Jun 27 '12 at 14:38
  • 1
    I started with the hypothesis specified in the question: *in my use case I would assume that the ordering always contains all Enum items*. So a contains operation is not needed. It always returns true. – JB Nizet Jun 27 '12 at 14:43
  • True that, it's just not a set, then. – bestsss Jun 27 '12 at 14:47
  • @JBNizet true, I was mostly concerned about how to reduce the memory footprint, not to implement a Collection with reasonable performance – Sean Patrick Floyd Jun 27 '12 at 14:47
  • @bestsss I never asked for a set (although I agree that my scenario implies one) – Sean Patrick Floyd Jun 27 '12 at 14:48
  • @SeanPatrickFloyd, I've already agreed, my reading comprehension sucks. – bestsss Jun 27 '12 at 14:54