63

I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.

My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.

So, in Java -

hashmap.put("johndoe","");

I can see if this user exists later by running -

hashmap.containsKey("johndoe"); 

This is O(1) right?

My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.

My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works. My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.

River
  • 8,585
  • 14
  • 54
  • 67
dozer
  • 861
  • 1
  • 11
  • 22
  • 3
    plus1 because it is a valid question when one doesn't know Java's data structures. – Daniel Aug 01 '16 at 05:23
  • 5
    A `HashSet` is an implementation of the `HashMap.keySet()`. If you want to turn a Map into a Set you can use `set = Collections.newSetFromMap(map)` – Peter Lawrey Aug 01 '16 at 07:48
  • This question is not wrong. This forum is the wrong place. "Stack Overflow is a question and answer site for professional and enthusiast programmers". – rdllopes Aug 01 '16 at 12:49
  • If the usernames are case-insensitive a `map` where you use the name as both key and value might be useful to map to the canonical representation of a username. – CodesInChaos Aug 01 '16 at 13:10
  • @rdllopes Absolutely nothing about that sentence suggests the question is inappropriate for Stack Overflow. – Chris Hayes Aug 02 '16 at 08:39
  • Come on, @ChrisHayes. A professional or enthusiast programmer should know that. "That is O(1), right!?" Sure, indeed it is in [Javadoc](http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html). Second, OP just described the `HashSet` implementation `public boolean add(E e) {return map.put(e, PRESENT)==null;}` using empty string instead of `new Object()` for `PRESENT` constant. – rdllopes Aug 02 '16 at 08:57
  • 3
    @rdllopes I guarantee that you, and nearly everyone on this site, has a gap in knowledge that somebody would claim you "should know". Tons of the highest rated questions on the site fall into that category. You don't get to be the arbiter of what questions are non-obvious enough to belong here. – Chris Hayes Aug 02 '16 at 09:05
  • @rdllopes. That O(1) right question was a rhetoric question . Someone (a mod I guess) edited it and put it in a separate paragraph making it look explicit. That bit was just figure of speech rather than an explicit question. The edit made it look explicit. – dozer Aug 02 '16 at 09:11

2 Answers2

101

Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.

My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.

The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.

Example usage:

Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));

System.out.println(users.contains("Alice"));
// -> prints true

System.out.println(users.contains("Jack"));
// -> prints false

Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case, as you have just keys, not key-value pairs. With a Set, you don't need to ask that, the usage is perfectly natural.

This is O(1) right?

Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.


Some comments point out that a HashSet is implemented in terms of HashMap. That's fine, at that level of abstraction. At the level of abstraction of the task at hand --- to store a collection of unique usernames, using a set is a natural choice, more natural than a map.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
janos
  • 120,954
  • 29
  • 226
  • 236
  • 2
    One thing I would mention, while I agree with this answer, you should clarify with your tech-lead if there was a reason behind wanting a `Map`. Perhaps they are under the assumption that you would be using this map to store other information related to the userid? If there is any other reason at all to have user related data in memory, you would probably want to store it there rather than creating another collection somewhere else, duplicating code. – gmiley Aug 01 '16 at 13:56
  • 13
    Note that a HashSet is implemented as a HashMap with an empty object (a single instance for all values) as the value. – Jim Garrison Aug 01 '16 at 14:49
  • 5
    @Janos: I wouldn't really say that the lead's advice was flawed... the idea was right, just the data structure was not the optimal choice. Map, even with empty values, still uses hash as a lookup method for the keys. So it is faster than array iteration. It is possible the tech lead comes from a Perl background - where it's common practice to use a hash with empty (or dummy) values to do O(1) key-exists checks. Perl doesn't have a Set data structure. – Greg Kennedy Aug 01 '16 at 17:28
  • 2
    @JörgWMittag To be even more precise, in Java 8, if the keys are Comparable, it's O(log n) worst-case; if a particular bucket is too overloaded, its linked-list collision handling will switch to a TreeSet-style bucket. This is to avoid a DoS attack in which the attacker could define, for example, a URL whose query string entries intentionally collide, turning an expected O(1) into O(n) (which then turn expected O(n) loops into O(n^2), etc). – yshavit Aug 02 '16 at 04:51
  • `Map` can be better than than `Set` in case you need to use methods like `computeIfAbsent` added to `Map` in Java 8 – Sergey Fedorov Aug 02 '16 at 19:59
38

This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.

For example :

HashSet's implementation of add is

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

where map is the backing HashMap and PRESENT is a dummy value.

My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.

As I mentioned, the developers of the JDK are using this same approach.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • Thanks, why not just use the existing implementation of a HashMap.put("aaa","") instead of a HashSet? Also since this approach is good, doesn't it make array's redundant? – dozer Aug 01 '16 at 05:26
  • 21
    @dozer HashSet is already an existing class which is part of the JDK, so why reinvent the wheel? It doesn't make arrays redundant, since arrays are more efficient when the number of elements is fixed, and arrays (as well as ArrayLists) allow duplicates. – Eran Aug 01 '16 at 05:29
  • 3
    @dozer Arrays don't just store elements, they store them in a fixed position; in a sense, they associate a number (the position) to the element. The same information can be stored in a Map, but (if the elements aren't sparse in their positions) in a much less efficient way – gcali Aug 01 '16 at 09:23
  • 5
    @dozer "doesn't it make array's redundant?" The locality of elements in an array makes iterating faster than with data structures which allocate each element separately (maps, sets, lists etc.). Usually successive array elements are cached together because they are close in memory, minimizing memory access. – Peter - Reinstate Monica Aug 01 '16 at 10:46