1

For a project I'm working on, I try to create a multi-dimensional pivot on large data sets. I have all the keys I want to use as ints, so basically, I want to return a set of

( int1, int2, int3, .. intN ) -> (Aggregate1, Aggregate2, ... , AggregateM)

I cannot use a N-dimensional array, as it might get huge and probably will be sparse. I've looked a Trove, but they do not have a Multi-key map. Apache commons has a multi-key map, but that is for Objects; that would probably work, but seems less interesting as the ints will get auto-boxed to Integers and vice versa.

Does anyone know of a primitive multi-key map implementation? (That maps to objects?)

Or, does anyone have great hints, maybe there is a better approach to my problem?

[edit] Insertion time is less interesting, I need the performance on the lookup, as the map will be heavily used to lookup values.

[edit2] Thanks for all the answers. My implemenation choice is a custom class containing an int[], immutable so the hashcode can be calculated on construction time.

private static class MultiIntKey
{
    int[] ints;

    private int hashCode;

    MultiIntKey( int[] ints )
    {
        this.ints = ints;
        this.hashCode = Arrays.hashCode( this.ints );
    }

    @Override
    public int hashCode()
    {
        return this.hashCode;
    }

    @Override
    public boolean equals( Object obj )
    {
        if ( this == obj )
        {
            return true;
        }
        if ( obj == null )
        {
            return false;
        }
        if ( this.getClass() != obj.getClass() )
        {
            return false;
        }
        MultiIntKey other = (MultiIntKey) obj;
        if ( this.hashCode != other.hashCode )
        {
            return false;
        }
        if ( !Arrays.equals( this.ints, other.ints ) )
        {
            return false;
        }
        return true;
    }
}
Rob Audenaerde
  • 19,195
  • 10
  • 76
  • 121
  • 1
    You might want to look at the source code of `SparseArray` and modify it to use multi-key. – Aleks G Sep 05 '14 at 19:11
  • What precisely do you mean by "multi-key"? – chrylis -cautiouslyoptimistic- Sep 05 '14 at 19:12
  • Multi-key: I have multiple things that combine to one key. Apache Commons, for example, has a Multi-Key map where the CombinedKey consists of `Objects`. – Rob Audenaerde Sep 05 '14 at 19:13
  • @AleksG I think you mean the Android SparseArray? – Rob Audenaerde Sep 05 '14 at 19:15
  • Actually, yes :) Sorry for the confusion. – Aleks G Sep 05 '14 at 20:01
  • So why exactly do you want to avoid Objects? – Cratylus Sep 05 '14 at 21:00
  • 3
    There is not enough info to propose an approach. Number of keys is statically known or not? Target most probable number of keys? Size of the domain / typical map key cardinality by each dimension? – leventov Sep 05 '14 at 22:16
  • Good questions @leventov. Unfortunately, I will not know these quantities on beforehand, as it mainly depends on user input. The number of dimensions will be limitid to 8 in practice. – Rob Audenaerde Sep 10 '14 at 07:07
  • @Cratylus Because object creation takes time. (see for example: http://stackoverflow.com/q/15582944/461499 ) If I could get away with using only primitives I figure it would be faster – Rob Audenaerde Sep 10 '14 at 07:09
  • @RobAu:But how is the link you posted relevant to your use case.That OP's test was incrementing a Double which creates Double objects in the for loop. That is what is slower vs using plain primitives. If you just instantiate a Double and store it as a key then IMHO you are optimizing prematurely. Creating a single object is cheap. Too cheap to think about it. – Cratylus Sep 10 '14 at 17:54
  • @Cratylus, each time I want to look-up something in the map I have to create a key-object. And I have millions of them, so I try to create as little objects as possible. I could not get around one (see above), but its much cheaper than the general MultiKeyMap apache commons provides (which uses objects for each element of the key) – Rob Audenaerde Sep 11 '14 at 09:07

4 Answers4

4

Apache commons has a multi-key map, but that is for Objects; that would probably work, but seems less interesting as the ints will get auto-boxed to Integers and vice versa.

Sure, it makes no sense to use N objects while trying to avoid one.

  • If you keys are small, consider packing them into a single int or long.
  • If they repeat a lot, consider a TIntObjectMap<TIntObjectMap<Value>> using trove4j, possibly with more nesting.
  • Otherwise, simply create a trivial object encapsulating all the ints. An object overhead is a few bytes, which is not that bad when compared to 4*N. A map hash a big overhead anyway...

If your map is immutable, go for Guava's ImmutableMap. Look at Guava Table, it's 2D only, but it may help to save a bit.


Only if you're sure you need to optimize a lot (have you done some benchmarking or profiling?) and you don't need a fully fledged map, think about your own implementation based on some int[], where you place all the keys in sequence. Most probably you'll find out that it wasn't worth it, but it's a good exercise. :D

maaartinus
  • 44,714
  • 32
  • 161
  • 320
1

Each key can be:

IntBuffer.wrap(new int[] { value1, value2, value3 })

IntBuffer's hashCode, equals, and compareTo methods depend on its contents, so they will work as HashMap or TreeMap keys. (Technically those methods depend on the remaining elements in the buffer, so just make sure you never change the position or limit of any of the IntBuffers you create.)

The one caveat is that order matters: IntBuffer.wrap(new int[] { 1, 2 }) is not equal to IntBuffer.wrap(new int[] { 2, 1 }).

VGR
  • 40,506
  • 4
  • 48
  • 63
  • Ah, interesting idea :). It still is object creation, but I think I cannot do without. The order of my `int`s is fixed (as it should be in an array). – Rob Audenaerde Sep 05 '14 at 19:30
  • 1
    @RobAu It's possibly good for simplicity, but the memory overhead is huge. You create two objects instead of one, an `int[]` and a `HeapIntBuffer` which itself has a couple of fields. If you want to save typing, use [Lombok](http://projectlombok.org)'s `@Value {int value1; int value2;}` and you're done (everything's immutable, and you get getters, constructor, and equals and hashCode). – maaartinus Sep 05 '14 at 21:29
0

A string key with ( int1, int2, int3, .. intN ) == (int1 + "" int2 + ""..).intern()

so all keys from same table via intern so very few new objects. Need to use intern every time you want to the key as its dynamic.

Community
  • 1
  • 1
tgkprog
  • 4,493
  • 4
  • 41
  • 70
  • The string option will work, the int1*10 etc option definitely will not. Think (1, 2) and (30, 1) - these will produce the same key - 50. – Aleks G Sep 05 '14 at 20:05
  • i agree, mid way started typing the string thing and then forgot to remove first part – tgkprog Sep 06 '14 at 10:25
-1

Easiest ways to do:

  1. Convert to JSON / Harness MongoDB for searching
  2. Create custom class to contain all keys, define its hashcode & equals method, and use a Map
ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
  • 3
    1. Converting each set of integers in the lookup to `String` will slow things down terribly, I think. – Rob Audenaerde Sep 05 '14 at 19:11
  • 2. This might be a part of the solutions, although I would prefer to stay away from lots of objects creating. – Rob Audenaerde Sep 05 '14 at 19:12
  • 2
    @RobAu very well. It sounds like you might need some sort of a custom solution then. One piece of advice I would give: You've pointed out inefficiencies with both of my suggestions, which is valid. But it is hard to optimize code before you start running it and finding out where your bottlenecks really are – ControlAltDel Sep 05 '14 at 19:26
  • 1
    @RobAu Note that the overhead for a single object is pretty negligible when compared to the first proposal. If someone's looking for optimizing a map, I'd never propose any DB as it means an order of magnitude slowdown. – maaartinus Sep 05 '14 at 20:02