First option should be faster. You could possibly make it even faster by sizing the set before using it. Typically, if you expect a small number of duplicates:
Set<String> undefined = new HashSet<String>(pairs.size(), 1);
Note that I used 1 for the load factor to prevent any resizing.
Out of curiosity I ran a test (code below) - the results are (post compilation):
Test 1 (note: takes a few minutes with warm up)
size of original list = 3,000 with no duplicates:
set: 8
arraylist: 668
linkedlist: 1166
Test 2
size of original list = 30,000 - all strings identical:
set: 25
arraylist: 11
linkelist: 13
That kind of makes sense:
- when there are many duplicates,
List#contains
will run fairly fast as a duplicate will be found more quickly and the cost of allocating a large set + the hashing algorithm are penalising
- when there are no or very few duplicates, the set wins, by a large margin.
public class TestPerf {
private static int NUM_RUN;
private static Random r = new Random(System.currentTimeMillis());
private static boolean random = false; //toggle to false for no duplicates in original list
public static void main(String[] args) {
List<String> list = new ArrayList<>();
for (int i = 0; i < 30_000; i++) {
list.add(getRandomString());
}
//warm up
for (int i = 0; i < 10_000; i++) {
method1(list);
method2(list);
method3(list);
}
NUM_RUN = 100;
long sum = 0;
long start = System.nanoTime();
for (int i = 0; i < NUM_RUN; i++) {
sum += method1(list);
}
long end = System.nanoTime();
System.out.println("set: " + (end - start) / 1000000);
sum = 0;
start = System.nanoTime();
for (int i = 0; i < NUM_RUN; i++) {
sum += method2(list);
}
end = System.nanoTime();
System.out.println("arraylist: " + (end - start) / 1000000);
sum = 0;
start = System.nanoTime();
for (int i = 0; i < NUM_RUN; i++) {
sum += method3(list);
}
end = System.nanoTime();
System.out.println("linkelist: " + (end - start) / 1000000);
System.out.println(sum);
}
private static int method1(final List<String> list) {
Set<String> set = new HashSet<>(list.size(), 1);
for (String s : list) {
set.add(s);
}
return set.size();
}
private static int method2(final List<String> list) {
List<String> undefined = new ArrayList<>();
for (String s : list) {
if (!undefined.contains(s)) {
undefined.add(s);
}
}
return undefined.size();
}
private static int method3(final List<String> list) {
List<String> undefined = new LinkedList<>();
Iterator<String> it = list.iterator();
while (it.hasNext()) {
String value = it.next();
if (!undefined.contains(value)) {
undefined.add(value);
}
}
return undefined.size();
}
private static String getRandomString() {
if (!random) {
return "skdjhflkjrglajhsdkhkjqwhkdjahkshd";
}
int size = r.nextInt(100);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < size; i++) {
char c = (char) ('a' + r.nextInt(27));
sb.append(c);
}
System.out.println(sb);
return sb.toString();
}
}