1

I already have a List<T>, which may or may not contain duplicates. I am going to build a Set<String> based on some result of computation using them, because two different Ts can produce the same String under some circumstances.

Some of the elements in the List<T> have undesirable attributes, so I need to filter them out. I filter like this:

List<T> myList = myCoolListGetter();
Iterator<T> it = myList.iterator();
T curr;
while (it.hasNext()) {
    curr = it.next();
    if (curr.shouldNotBeInResult()) {
        myList.remove(curr);
    }
}

I then pass this List<T> to another method which performs those computations I mentioned, adding them to a Set<String>.

I'm thinking perhaps I could save some time by inserting the elements which should be in the result rather into a Set<T> rather than removing them from the List<T>. Since I am already iterating over the List<T> here, and have to construct a Set<String> later anyway, would I actually be saving time by doing that?

2rs2ts
  • 10,662
  • 10
  • 51
  • 95
  • 2
    I would say to use the former approach since it is cleaner and will perform as expected. Except that you should use `it.remove` instead of `myList.remove(curr)` for a faster removal. – Luiggi Mendoza Apr 04 '14 at 20:43
  • You can use `System.currentTimeMillis()` to get a start time, then create the Set and call `System.currentTimeMillis()` again to get the end time. Subtract the start time from the end time to get the elapsed time. Repeat for the `List` removal approach and compare the results. – Mike B Apr 04 '14 at 20:44
  • 3
    @MikeB that would be doing a micro benchmark. Please refer to [How do I write a correct micro-benchmark in Java?](http://stackoverflow.com/q/504103/1065197) before trying to do these kind of tests. – Luiggi Mendoza Apr 04 '14 at 20:45
  • 1
    Before asking "What's faster?", you should first ask, "Does it matter?" – Lie Ryan Apr 04 '14 at 20:50
  • @LieRyan sometimes it does, it depends on the kind of system you're working on. The current system I'm working on was made based on this approach *does it matter the performance now?* from ancestral times and blindly trusting in the benefits of bytecode enhancements by the JVM and now one of the critical parts is to enhance the performance of all the system... – Luiggi Mendoza Apr 04 '14 at 20:51
  • For the question is X faster than Y the answer is always : measure it! In this case it's about how fast creation of strings is and how likely douplicates are. And what you should measure is what's the precentage of douplcates to make the set approach to be faster. – Bernd Ebertz Apr 04 '14 at 21:04
  • Move `curr` to inside the while loop. – Steve Kuo Apr 04 '14 at 21:17
  • @LieRyan In my case, a huge performance boost would matter, but small differences wouldn't. Thanks everyone. – 2rs2ts Apr 08 '14 at 16:22

3 Answers3

2

IMO, creating a new List<T> will be better then removing from the existing List<T>, if you can specify enough initial capacity (see - public ArrayList(int initialCapacity), the constructor that lets you specify the capacity of the ArrayList at the time of it's construction). Because then you will only be adding elements to it without having to readjust the capacity. Readjusting means creating a new backing array and copying the exiting elements over to that new array.

While on the other hand, removal from the list will require shifting the rest of the elements to the left. The only time that this operation will not require shifting the elements is when the element being removed is the last element.

The reason I said - a new List<T> instead of Set<T> is because unlike a set, list doesn't need to care whether an added element is duplicate or not.

Bhesh Gurung
  • 50,430
  • 22
  • 93
  • 142
1

Consider using Google Guava's filter and Predicate.

Collection<T> filtered = Collections2.filter(myCoolListGetter(),
    new Predicate<T>() {
      public boolean apply(T t) {
        return !t.shouldNotBeInResult();
      }
    });

or

Iterable <T> filtered = Iterables.filter(myCoolListGetter(),
    new Predicate<T>() {
      public boolean apply(T t) {
        return !t.shouldNotBeInResult();
      }
    });

The returned collection/iterator will be lazily evaluated, and O(n) if iterated. Lazily evaluation is very handy. Iterables.filter is particularly nice since it merely presents a filtered view of the original Iterable (list). No new ArrayList is created, it simply walks each item and calls apply.

Steve Kuo
  • 61,876
  • 75
  • 195
  • 257
0

Probably. Inserting something into a HashSet is O(1), since the hashing makes the duplication check really fast. So adding n things would only be O(n), which is how long it would take to go through your list once anyway.

azurefrog
  • 10,785
  • 7
  • 42
  • 56