1

I'm doing a problem that clearly requires usages of Set but I need to retrieve elements from a set in insertion order + retrieve specific elements.

But finding the specific element was too slow( I guess O(n) since I have to go through the entire set to find & return it ).

So I went with a LinkedHashMap<someClass,someClass> where the key-value mappings contained the same objects.

Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up alot of space.

I'm hoping if anyone has a better solution to my problem or optimizations.

Edit: By the way the comments for this SO answer may help

Edit:

 public Set<CompanyDummy> run(int numberOfCompanies) 
 {
        Set<CompanyDummy> companies=new LinkedHashSet<>();
        
        //Create companies
        CompanyDummy company;
        for(int idx=0;idx<=numberOfCompanies-1;idx++)
        {
            company=new CompanyDummy(idx);
            companies.add(company);
        }
        
        
        //Some code here to fill up each CompanyDummy element with engineers
        
        //At this point,there will be input 
        //specifying which companies to merge via their index(not reference)
        //Problem guarantees those companies exist. Hence, my reason why 
        //I didn't  do something like
        //if (set.contains(value)) return value;
        
        //Do note when we merge companies u & v, v is closed down
        
        for(int idx=0;idx<=transactions-1;idx++)
        {
            companyID= scanner.nextInt();
            anotherCompanyID= scanner.nextInt();
            
            //This part is where I search through companies to find if one exists
            //which is O(n)
            //Replacing sets with maps somehow makes the program faster
            //despite LinkedHashSet being backed by LinkedHashMap
            company=findCompany(companies, companyID);
            anotherCompany=findCompany(companies, anotherCompanyID);
            
            if(company!=null && anotherCompany!=null)
            {
                company.union(anotherCompany);
                companies.remove(anotherCompany);
            }
            

        }
 }
 
 private CompanyDummy findCompany(Set<CompanyDummy> companies,int id)
 {
        
        for(CompanyDummy company : companies)
        {
            if(company.getIndex()==id)
            {
                return company;
            }
        }
        
        return null;
  }

}


class CompanyDummy
{
private int index;
private Set<Integer> engineers;//The Integer here just denotes the engineer

public  CompanyDummy(int index) 
{
    this.index=index;
}

public int getindex()
{
    return index;
}

public void union(CompanyDummy someCompany)
{
    this.engineers.addAll(someCompany.engineers);
}

}

Leon
  • 346
  • 3
  • 15
  • 2
    Could you explain how LinkedHashSet's don't work for your usage? – CodingTil Jan 13 '21 at 13:28
  • There's a point, where I have to retrieve specific elements as I stated. That part causes my program to be pretty slow for large inputs. – Leon Jan 13 '21 at 13:30

3 Answers3

2

Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up alot of space.

I don't see how LinkedHashMap would be faster. Perhaps a little faster given that LinkedHashSet is implemented using a backing LinkedHashMap instance, so it will have a little overhead compared to using LinkedHashMap directly.

As for the memory, both will take the exact same amount of memory, no matter the size of the key/value instances. You see, when you add an element X to a LinkedHashSet, it actually puts the entry <X,PRESENT> to the underlying LinkedHashMap (where PRESENT is a reference to some dummy Object). Putting <X,X> instead of <X,PRESENT> makes no difference, since the LinkedHashMap only contains references to the keys and values, not copies of the key/value instances.

But finding the specific element was too slow (I guess O(n) since I have to go through the entire set to find it)

A HashSet/LinkedHashSet takes O(1) to find out if a specific element is already in the Set. You don't have to iterate over the entire Set. If the Set contains that element, you already have a reference to it. You don't have to look for the Set element which is equal to the one you are searching for.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • For the last paragraph, I made an edit, I actually want to return it as well but since you say it used LinkedHashMap `internally`, I guess it's `O(1)`? For some reason, my program takes around 1 minute with `Set` and 3 seconds with `Map` – Leon Jan 13 '21 at 13:43
  • 2
    @Leon in that case, please provide 2 short code snippets that recreate your problem (one with LinkedHashMap and the other with LinkedHashSet).. – Eran Jan 13 '21 at 13:50
  • @Leon - based on the behavior you are describing, I suspect you are not using `Set.contains` to check set membership. `contains` is `O(1)` on a `LinkedHashSet`. – Stephen C Jan 13 '21 at 13:52
  • @StephenC Yup, I didn't use contains as I want to return it as well. The problem specified it'll be a valid element – Leon Jan 13 '21 at 13:55
  • Huh? I don't follow you. It would seem that `if (set.contains(value)) return value;` does that. Perhaps you need to show us the relevant code so that we can understand your problem. See Eran's comment above. – Stephen C Jan 13 '21 at 13:56
1

OK, so now that I see the code, I can understand the problem well enough to give you a solution.

You are not simply retrieving a specific element from the set. You are actually retrieving an element with a specific value in a specific field if it is a member of the set. That's why you can't use the set's (or map's) fast lookup operations1.

Solution:

If you want to do the retrieval in better than O(N), you need a Map<Integer, CompanyDummy>. The map must be populated so that it maps from an id to the CompanyDummy with that id as its index. Then you can replace the findCompany calls with Map.get calls.

Note that replacing a Set<CompanyDummy> with a Map<CompanyDummy, CompanyDummy> will not help for this problem. Both will give you O(N) performance.


You say this about your experiment with replacing a LinkedHashSet<CompanyDummy> with a LinkedHashMap<CompanyDummy, CompanyDummy>:

Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up a lot of space.

I doubt that either of those things were actually true.

  1. Space utilization of the two data structures should be identical, given that that a LinkedHashSet is a actually LinkedHashMap under the covers.

  2. It is (IMO) unlikely that there was a significant real difference in performance. Certainly not a 2-fold difference. The most likely explanation is that your methodology for measuring and comparing performance does take proper account of possible timing distortions caused by (various) JVM startup overheads; e.g. class loading, heap resizing and garbage collection and JIT compilation.


1 - Just for completeness, if it wasn't a requirement that your set / map is ordered in insertion order, you could use a TreeSet<CompanyDummy> with a custom Comparator that orders CompanyDummy objects by their index values. You could then use OrderedSet.ceiling with a dummy CompanyDummy instances to probe the set in O(logN).

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • "If you want to do the retrieval in better than `O(N)`, you need a `Map`....Both will give you `O(N)` performance.". Do you mean to say the retrieval is `O(1)` but overall is still `O(N)`? – Leon Jan 13 '21 at 22:47
  • No. I mean that implementation of the semantics of your `findCompany` with a `Set` or a `Map` results in `O(N)` calls. You cannot make use of `Set.contains` or `Map.get` ... which is where the performance advantages are. You need fast lookup by `id` rather than iteration. Iteration is `O(N)`. – Stephen C Jan 13 '21 at 23:08
  • Do you mean `Map.get()` with `Map` is still `O(n)` or just replacing the `Set`s with `Map`s and still using `findCompany` is still `O(n)`? You meant the latter right ? – Leon Jan 13 '21 at 23:37
  • A `get` call on (say) a `LinkedHashMap` would be `O(1)`. **However**, using `get` with a `Map` won't work for your use-case. You can't implement your `findCompany` method that way. A `get` call requires the correct key as an argument. You don't have the `CompanyDummy` key yet. That is what you are trying to lookup! So you have to iterate, which is `O(N)`. – Stephen C Jan 14 '21 at 02:16
0

in java, HashSet is implemented using HashMap.

HashSet is indeed not intended for specific element retrieval. the key in your map should be something that identifies the value object which is mapped to it. also, it is usually a bad idea to have complex objects as map keys, unless you have a really good implementation of HashCode().

map will also not maintain insertion order (LinkedHashMap might keep it since it is backed by a LinkedList). you can set your keys to a sequence of integers denoting the insertion order. for example:

Map<Integer, SomeClass> map = new HashMap<>();
map.put(0, object1);
map.put(1, object2);

and so on. this will be better than a LinkedList because of fetch time is O(1). later, you can fetch them in a for loop with a running index to get them in order they were inserted.

if you need the property of HashSet that prevents duplicates in the collection, you will need to keep a Set of some unique identifier of each object. for instance ID# of a person. before each object inserted to the map you also check if the "unique key" is not in the set, and if so insert the "unique key" to the set and the map. it all depends on the nature of your objects.

let me know in the comments if you need anything else. good luck!

Tom Elias
  • 751
  • 6
  • 15