4

I encountered below code in my project. I was wondering if it can be optimized further may be by using java 8 streams or by collection APIs in general.

private Set<Student> getFilteredSet() {
    Set<Student> unfilteredSet = getAllStudents();
    Set<Student> adminAreaSet = getAdminStudents();

    Set<String> adminAreaID = new HashSet<>();
    Set<Student> filteredSet = new HashSet<>();

    for (final Student student : adminAreaSet) {
        adminAreaID.add(student.getId());
    }
    for (final Student student : unfilteredSet) {
        if (adminAreaID.contains(student.getId())) {
            filteredSet.add(student);
        }
    }   
    return filteredSet;
}

Note: unfilteredSet and adminAreaSet hold different child types of Student

shmosel
  • 49,289
  • 6
  • 73
  • 138
Faiz Kidwai
  • 463
  • 5
  • 26

2 Answers2

8

Since the question is tagged with , one way to improve readability of the code could be to transform it as :

Set<String> adminAreaID = getAdminStudents().stream()
        .map(Student::getId)
        .collect(Collectors.toSet());

return getAllStudents().stream()
        .filter(student -> adminAreaID.contains(student.getId()))
        .collect(Collectors.toSet());
Naman
  • 27,789
  • 26
  • 218
  • 353
  • 2
    @shmosel Maybe not in this scope/context of the method (since we have already achieved the goal). Just wanted to point out that there is no guarantee on the type of Set returned by `Collectors.toSet()`. But I agree, it is not needed here. – Thiyagu Aug 31 '18 at 03:47
  • @shmosel I was just thinking about the complexity of `contains` - O(1) in HashSet vs O(log n) in TreeSet – Thiyagu Aug 31 '18 at 03:49
  • @user7 Who said anything about TreeSet? – shmosel Aug 31 '18 at 03:49
  • @shmosel Since the javadoc of `Collectors.toSet` does not say which implementation of Set will be returned. – Thiyagu Aug 31 '18 at 03:51
  • @user7 the implementation in the `Collectors` class(from jdk) is what you might be interested to take a look at. – Naman Aug 31 '18 at 03:53
  • 2
    Yes. Today it is returning `HashSet::new`. But there is no guarantee that it would continue to do the same in the future.. right? – Thiyagu Aug 31 '18 at 03:54
  • @user7 Right. But the [documentation](https://docs.oracle.com/javase/9/docs/api/java/util/stream/Collectors.html#toSet--) is clear about which attributes might potentially change: *type, mutability, serializability, or thread-safety*. Worrying about performance seems a stretch. If you're going down that road, why ever use `toSet()`? Or [`Set.of()`](https://docs.oracle.com/javase/9/docs/api/java/util/Set.html#of-E...-) for that matter? Maybe it returns a set with linear lookup time. – shmosel Aug 31 '18 at 03:58
  • 2
    @shmosel Yes. I agree with what you say. If that is the case, then no one would ever use it and use the one (toCollection) by passing the type we expect. – Thiyagu Aug 31 '18 at 04:00
  • 1
    @user7 mind that even `HashSet`’s `contains` does not always provide `O(1)`. If `Collectors.toSet` ever returns something different, there must be a reason, which might be a reason for your application, to use that different `Set` implementation as well. – Holger Aug 31 '18 at 07:40
  • @Holger Yes. It is not always `O(1)` if there are a lot of collisions. By your last statement, do you mean that if the JDK guys change it, then it will be sort of a change that will be beneficial to all using them (without being bothered about what the implementation is)? – Thiyagu Aug 31 '18 at 09:10
  • 2
    @user7 yes, you can assume that they won’t make it worse, however, there might be different considerations. What if your code is running in an environment with really low memory and the particular JRE provides a map which has worse than `O(1)` lookup, but needs significant less memory? This is what I mean with “there’s a reason” and you don’t benefit from counteracting it. So if you don’t need guarantees, like mutability, don’t insist on getting a `HashMap`. – Holger Aug 31 '18 at 12:09
2

According to your comment, you're looking for speed optimization. There is a lot of post comparing Stream and Collection and even more on whole Internet. I recommend you by example to have a look at this question which compares speed performance between Streams and old for each loop: Java 8: performance of Streams vs Collections. As using Stream creates a lot of intermediate objects and call intermediate methods, it seems normal to be slower than basic for each loop. However you can use stream to have more readable/smaller code.

To answer the question, I think your code is already very good considering speed performance. All I see is that you should initialize adminAreaID because you know exactly the size it will have :

    Set<String> adminAreaID = new HashSet<>(adminAreaSet.size(), 1.);

By setting size and load factor, you ensure no time will be used to grow up your set. According to https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html :

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

You have to set it at 1. because you won't get higher the adminAreaSet size. Moreover if you let it at .75 (the default value), your Set will grow up once when the loop will reach 75% of its capacity which is useless.

If you have no memory concern, you shoul do the same with filteredSet :

    Set<Student> filteredSet = new HashSet<>(unfilteredSet.size(), 1.);

In fact, as you filter unfilteredSet, you won't reach the max capacity but it will ensure you that filteredSet will not grow up during its filling.