36

Is it (performance-wise) better to use Arrays or HashMaps when the indexes of the Array are known? Keep in mind that the 'objects array/map' in the example is just an example, in my real project it is generated by another class so I cant use individual variables.

ArrayExample:

SomeObject[] objects = new SomeObject[2];
objects[0] = new SomeObject("Obj1");
objects[1] = new SomeObject("Obj2");

void doSomethingToObject(String Identifier){
    SomeObject object;
    if(Identifier.equals("Obj1")){
        object=objects[0];
    }else if(){
        object=objects[1];
    }
    //do stuff
}

HashMapExample:

HashMap objects = HashMap();
objects.put("Obj1",new SomeObject());
objects.put("Obj2",new SomeObject());

void doSomethingToObject(String Identifier){
    SomeObject object = (SomeObject) objects.get(Identifier);
    //do stuff
}

The HashMap one looks much much better but I really need performance on this so that has priority.

EDIT: Well Array's it is then, suggestions are still welcome

EDIT: I forgot to mention, the size of the Array/HashMap is always the same (6)

EDIT: It appears that HashMaps are faster Array: 128ms Hash: 103ms

When using less cycles the HashMaps was even twice as fast

test code:

import java.util.HashMap;
import java.util.Random;

public class Optimizationsest {
private static Random r = new Random();

private static HashMap<String,SomeObject> hm = new HashMap<String,SomeObject>();
private static SomeObject[] o = new SomeObject[6];

private static String[] Indentifiers = {"Obj1","Obj2","Obj3","Obj4","Obj5","Obj6"};

private static int t = 1000000;

public static void main(String[] args){
    CreateHash();
    CreateArray();
    long loopTime = ProcessArray();
    long hashTime = ProcessHash();
    System.out.println("Array: " + loopTime + "ms");
    System.out.println("Hash: " + hashTime + "ms");
}

public static void CreateHash(){
    for(int i=0; i <= 5; i++){
        hm.put("Obj"+(i+1), new SomeObject());
    }
}

public static void CreateArray(){
    for(int i=0; i <= 5; i++){
        o[i]=new SomeObject();
    }
}

public static long ProcessArray(){
    StopWatch sw = new StopWatch();
    sw.start();
    for(int i = 1;i<=t;i++){
        checkArray(Indentifiers[r.nextInt(6)]);
    }
    sw.stop();
    return sw.getElapsedTime();
}



private static void checkArray(String Identifier) {
    SomeObject object;
    if(Identifier.equals("Obj1")){
        object=o[0];
    }else if(Identifier.equals("Obj2")){
        object=o[1];
    }else if(Identifier.equals("Obj3")){
        object=o[2];
    }else if(Identifier.equals("Obj4")){
        object=o[3];
    }else if(Identifier.equals("Obj5")){
        object=o[4];
    }else if(Identifier.equals("Obj6")){
        object=o[5];
    }else{
        object = new SomeObject();
    }
    object.kill();
}

public static long ProcessHash(){
    StopWatch sw = new StopWatch();
    sw.start();
    for(int i = 1;i<=t;i++){
        checkHash(Indentifiers[r.nextInt(6)]);
    }
    sw.stop();
    return sw.getElapsedTime();
}

private static void checkHash(String Identifier) {
    SomeObject object = (SomeObject) hm.get(Identifier);
    object.kill();
}

}

Timotheus
  • 2,250
  • 5
  • 29
  • 39
  • 7
    Waitaminnit; if you "really need performance" you would already know the answer to this question because you would have profiled your app to determine that this Hashmap vs. Array issue was actually relevant. If you have not, you need to do so to determine this really does affect your performance and is not actually a waste of time. – Dour High Arch Jun 24 '11 at 00:20
  • 3
    Good point, but as a novice programmer it's always nice to get some reflection in case you miss something or get suggestions on better strategies – Timotheus Jun 24 '11 at 00:26
  • How often is doSomethingToObject called? And yes, the advice, a novice programmer should take seriously, is Dour High Arch's advice. – user unknown Jun 24 '11 at 00:30
  • 4
    if you "really need performance" you should not write in java... –  Jun 24 '11 at 00:47
  • @Franklin: Huh? Java is quite fast these days. – Daniel Pryden Jun 24 '11 at 00:53
  • @Franklin I've seen benchmarks where Java was nearly equal to C++ – Timotheus Jun 24 '11 at 01:31
  • @Dasdasd there are lies, damn lies, statistics and benchmarks... –  Jun 24 '11 at 02:35
  • 3
    Dasdasd, you have to benchmark your actual app running with real data. Making up fake "benchmarks" using two sample points will only mislead you. You must benchmark your whole application and optimize bottlenecks, not pick lines of code that look like fun to change. If this issue is not a bottleneck, optimizing it will not improve performance no matter how fast you make it. – Dour High Arch Jun 24 '11 at 16:37
  • If there values are known, I'd suggest making the values an enum and using an EnumMap – Michael Krussel Jun 24 '11 at 17:24
  • @Dour High Arch his question is valid, your point isn't, your logic of writing an app and then profiling it to determine bottlenecks is backwards in the context of this question, perhaps SO isn't the right place to ask a question like this but its valid to want to learn the underlying mechanics / performance of a specific data structure, everything shouldn't be done using trial and error, which is basically what you are suggesting – Rick Nov 23 '11 at 22:23
  • 2
    Your benchmarking is flawed. You need to do each test in its own JVM invocation. As is now, if you reversed ProcessArray and ProcessHash you'll see different values. – Steve Kuo Jan 23 '13 at 17:40

6 Answers6

44

HashMap uses an array underneath so it can never be faster than using an array correctly.

Random.nextInt() is many times slower than what you are testing, even using array to test an array is going to bias your results.

The reason your array benchmark is so slow is due to the equals comparisons, not the array access itself.

HashTable is usually much slower than HashMap because it does much the same thing but is also synchronized.

A common problem with micro-benchmarks is the JIT which is very good at removing code which doesn't do anything. If you are not careful you will only be testing whether you have confused the JIT enough that it cannot workout your code doesn't do anything.

This is one of the reason you can write micro-benchmarks which out perform C++ systems. This is because Java is a simpler language and easier to reason about and thus detect code which does nothing useful. This can lead to tests which show that Java does "nothing useful" much faster than C++ ;)

Mooncrater
  • 4,146
  • 4
  • 33
  • 62
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 4
    good point, Java is basically smart enough to do nothing if its not needed – Rick Nov 23 '11 at 22:26
  • @PeterLawrey is there a way to tell which code was "JIT"ed out? I usually use caliper framework for this benchmarks, but they suggest the same thing - "be careful".. – Eugene Apr 19 '13 at 07:48
  • @Eugene The way I tell is the code is now impossibly fast. e.g. taking much less than one clock cycle per iteration. This shows up best on repeated tests. – Peter Lawrey Apr 21 '13 at 20:23
5

arrays when the indexes are know are faster (HashMap uses an array of linked lists behind the scenes which adds a bit of overhead above the array accesses not to mention the hashing operations that need to be done)

and FYI HashMap<String,SomeObject> objects = HashMap<String,SomeObject>(); makes it so you won't have to cast

ratchet freak
  • 47,288
  • 5
  • 68
  • 106
  • 2
    "arrays are faster" is at best a gross oversimplification, and at worst flat-out wrong. Arrays and hashtables are apples and oranges. Lookup of a value in an array given its array index and lookup of an item in a hashtable given its key are both asymptotically equivalent: O(1) (well, amortized O(1) in the hashtable case). But looping through an array to find a given key is much worse: O(n/2) on average, which is asymptotically O(n). True, the array implementation may win *in this case*, but your answer implies that it's a general rule. – Daniel Pryden Jun 24 '11 at 01:05
  • @daniel well the "when the indexes of the Array are known" points to arrays being the winner but I'll edit the caveat explicitly in the answer – ratchet freak Jun 24 '11 at 01:29
2

For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 2
    Just mincing words - For the example shown, array probably wins. But for a different example where there are more `Identifier` values to check against, what you say holds. – jwd Jun 24 '11 at 00:19
  • The array's length is known too (6 if it matters) – Timotheus Jun 24 '11 at 00:21
2

Logically, HashMap is definitely a fit in your case. From performance standpoint is also wins since in case of arrays you will need to do number of string comparisons (in your algorithm) while in HashMap you just use a hash code if load factor is not too high. Both array and HashMap will need to be resized if you add many elements, but in case of HashMap you will need to also redistribute elements. In this use case HashMap loses.

Alex Gitelman
  • 24,429
  • 7
  • 52
  • 49
0

Arrays will usually be faster than Collections classes.

PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo

"The HashTable one looks much much better "

Basanth Roy
  • 6,272
  • 5
  • 25
  • 25
0

The example is strange. The key problem is whether your data is dynamic. If it is, you could not write you program that way (as in the array case). In order words, comparing between your array and hash implementation is not fair. The hash implementation works for dynamic data, but the array implementation does not.

If you only have static data (6 fixed objects), array or hash just work as data holder. You could even define static objects.

BlueGuitar
  • 41
  • 3