88

I want to know the comparison between List and Set in terms of performance,memory allocation and usability.

If i don't have any requirement of keeping the uniqueness in the list of objects, neither required the insertion order to be maintained, Can I use ArrayList and SortedSet/HashSet interchangeably? Will it be good to directly use Collections class instead of even list/set?

P.S. I also don't have any need for list or set specific functions provided by java. I am using List/Set instead of Array only because they can dynamically grow without extra programming efforts.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Priyank Doshi
  • 12,895
  • 18
  • 59
  • 82
  • 1
    For starters : http://stackoverflow.com/questions/1035008/what-is-the-difference-between-set-and-list – Kazekage Gaara May 29 '12 at 12:46
  • 1
    What *do* you need from the collection? – Jon Skeet May 29 '12 at 12:47
  • 4
    You need to specify exactly what operations you plan to use for it. Every implementation is a tradeoff in favor of some operation at the expense of others. – Marko Topolnik May 29 '12 at 12:49
  • What you are currently stating is that you need the collection **only to add elements**. That is nonsense. There must be some kind of read operation that you also need. Specify. – Marko Topolnik May 29 '12 at 12:52
  • Collection is not a class, btw. Not even an abstract class. – Marko Topolnik May 29 '12 at 12:53
  • I may use a iterator. But there is no need for remaining insertion order. So even in this case, using any of list/set won't matter , i guess. – Priyank Doshi May 29 '12 at 12:55
  • If you want to use List/Set instead of Array.. So in this can case you should go for List instead of Set, because Array can have duplicates. but set cant have duplicates. So you may loose some duplicate data. – Satya May 10 '13 at 11:57
  • Finding an element by value in a HashSet is O(1). In an ArrayList, it's O(n). and HashSet consumes about 5.5 times more memory than ArrayList
    So which better to use in inverted index.
    – jsingh Feb 04 '17 at 05:39

6 Answers6

120

HashSet consumes about 5.5 times more memory than ArrayList for the same number of elements (although they're both still linear), and has significantly slower iteration (albeit with the same asymptotics); a quick Google search suggests a 2-3x slowdown for HashSet iteration versus ArrayList.

If you don't care about uniqueness or the performance of contains, then use ArrayList.

Divyesh Kanzariya
  • 3,629
  • 3
  • 43
  • 44
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • 1
    Well, if you know the number of elements to expect in advance, and allocate the `ArrayList` accordingly, it's closer to 8 times. But it doesn't sound like we can make that assumption about the OP's situation. – Louis Wasserman May 29 '12 at 13:02
  • 1
    Basically, the rule is that arrays are 4 bytes per element, and an `ArrayList` consists of little more than an array (with some room to grow) and a constant number of `int` fields. – Louis Wasserman May 29 '12 at 13:15
  • Nowadays they're more likely to be 8 bytes/element. – Marko Topolnik May 29 '12 at 13:36
  • 3
    Depends if you're using a 32-bit or a 64-bit VM. That said, `HashSet` gets hurt worse by 8-byte references than `ArrayList` does -- adding an extra 4 bytes per reference, based on the linked memory consumption chart, brings `ArrayList` up to ~12 bytes per element, and `HashSet` up to ~52 bytes per element.) – Louis Wasserman May 29 '12 at 13:39
69

If you don't care about the ordering, and don't delete elements, then it really boils down to whether you need to find elements in this data structure, and how fast you need those lookups to be.

Finding an element by value in a HashSet is O(1). In an ArrayList, it's O(n).

If you are only using the container to store a bunch of unique objects, and iterate over them at the end (in any order), then arguably ArrayList is a better choice since it's simpler and more economical.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 4
    Finding a element in `ArrayList` is `O(n)`? Is not find element in `ArrayList` is direct access via index? – Pau Kiat Wee May 29 '12 at 12:53
  • 34
    I assume by "find element" he means `contains`. – Louis Wasserman May 29 '12 at 12:58
  • @PauKiatWee: I mean "find by value". OP has already stated that he doesn't care about the ordering of elements, so accessing by index is clearly irrelevant. – NPE May 29 '12 at 12:59
  • 3
    Finding an element by value in a HashSet is O(1) How is it? – jsingh Feb 04 '17 at 05:36
  • 1
    @Jitendra Because finding an element in a hash table involves computing a hash for that element, which is an O(1) operation, followed by using that hash to index into the array, which is an O(1) operation. – Tripp Kinetics Sep 18 '22 at 17:07
4

If you plan only to add elements and later iterate over them, your best bet is ArrayList as it's closest to the arrays you are replacing. It's more memory efficient than LinkedList or any Set implementation, has fast insertion, iteration, and random access.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • `ArrayList` doesn't have fast insertion. – Tripp Kinetics Sep 18 '22 at 17:12
  • It's O(n) just like random insertion into `LinkedList`, but with a much lower linear factor. So it's pretty good. – Marko Topolnik Sep 18 '22 at 17:22
  • Finding the insertion location in a `LinkedList` is O(n), but if you already have it, which is fairly common, the insertion is O(1). Finding the insertion location in an `ArrayList` is also O(n), and the insertion itself is O(n). So, best case scenario, when is when you have the insertion location already. Then `LinkedList` is O(1) and `ArrayList` is O(n). Worst case is that you need to find the insertion location. In this case, `LinkedList` is O(n) and `ArrayList` is an O(n), followed by an O(n), which is simply O(n). – Tripp Kinetics Sep 22 '22 at 23:16
  • `list.add(index, element)` is O(n) for both `LinkedList` and `ArayList` (as commented above). If you need to search for the insertion location, then the linear factor for traversing the `ArrayList` is also much lower than for `LinkedList`, could be as much as an order of magnitude if the indiividual node allocations aren't closely spaced. In most scenarios the `ArrayList` will come out as performing better overall, due to the way modern hardware penalizes unpredictable memory access patterns. – Marko Topolnik Sep 23 '22 at 07:27
4

If you will compare, searching between List and Set, Set will be better because of the underline Hashing algorithm.

In the case of a list, in worst case scenario, contains will search till the end. In case of Set, because of hashing and bucket, it will search only subset.

Sample use case: Add 1 to 100_000 integer to ArrayList and HashSet. Search each integer in ArrayList and HashSet.

Set will take 9 milliseconds where as List will take 16232 seconds.

private static void compareSetvsList(){
    List<Integer> list = new ArrayList<>() ;
    Set<Integer> set = new HashSet<>() ;

    System.out.println("Setting values in list and set .... ");
    int counter = 100_000  ;

    for(int i =0 ; i< counter ; i++){            
        list.add(i);
        set.add(i);
    }

    System.out.println("Checking time .... ");
    long l1 = System.currentTimeMillis();
    for(int i =0 ; i< counter ; i++) list.contains(i);

    long l2 = System.currentTimeMillis();
    System.out.println(" time taken for list : "+ (l2-l1));

    for(int i =0 ; i< counter ; i++)set.contains(i);

    long l3 = System.currentTimeMillis();
    System.out.println(" time taken for set : "+ (l3-l2));

    //      for 10000   time taken for list : 123        time taken for set : 4
    //      for 100000  time taken for list : 16232          time taken for set : 9
    //      for 1000000 time taken for list : hung       time taken for set : 26

}
Manoj
  • 41
  • 1
1

Use HashSet if you need to use .contains(T) frequently.

Example:

private static final HashSet<String> KEYWORDS = Stream.of(new String[]{"if", "do", "for", "try", "while", "break", "return"}).collect(Collectors.toCollection(HashSet::new));

public boolean isKeyword(String str) {
     return KEYWORDS.contains(str);
}
Dave_cz
  • 1,180
  • 10
  • 17
0

If you don't have the requirement to have unique elements in collection simply use ArrayList unless you have very specific needs.

If you have the requirement to have only unique elemets in collection, then use HashSet unless you have very specific needs.

Concerning SortedSet (and it's implementor TreeSet), as per JavaDoc:

A Set that further provides a total ordering on its elements. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time.

Meaning it's targeted at quite specific use cases, when elements should be always ordered in a set, which is not needed usually.

Yuriy Nakonechnyy
  • 3,742
  • 4
  • 29
  • 41
  • Element uniqueness is not the only advantage of a `HashSet`. Finding a specific element of a collection is an O(n) operation for an `ArrayList`, and an O(1) operation for a `HashSet`. In order to add ordering to the mix, you have two choices. There is the `TreeSet`, for which access operations are O(log(n)), or `LinkedHashSet`, for which access operations are O(1). The difference is that `TreeSet`s keep the elements sorted, where a `LinkedHashSet` maintains insertion order. – Tripp Kinetics Sep 18 '22 at 17:11