0

If I had the following

postcodeToCompaniesList = new HashMap<String, List<Company>>();

After some processig the hashmap has many thousands of postcodes, and each postcode is mapped to a list with 1 or more Company objects.

Now I want to get the top 10 most popular postcodes and print out the number of companies at the postcode i.e. "Postcode SW1A 0AA has 50 companies".

The approach I've taken, and I think this is naive, is to create a Record object to hold the postcode and the corrisponding number of companies at that postcode. I then create a RecordList object for managing a list of 10 Record objects which have the highest count value (see below).

The RecordList.addRecord(Record record) method adds records until 10 are added and then conditionally adds more only if the count value is higher than the lowest value currently in the list. Then it sorts the list and removes the lowest value thereby maintaing a count of 10 elements.

final class Record implements Comparable<Record> {
    public String postcode;
    public int count;

    public Record(String postcode, int count){
        this.postcode = postcode;
        this.count = count;
    }

    @Override 
    public int compareTo(Record r){
        return Integer.valueOf(this.count).compareTo(Integer.valueOf(r.count));
    }
}

final class RecordList{
    List<Record> top10 = new ArrayList<Record>();
    
    public void addRecord(Record record){
        if (this.top10.size() < 10){
            this.top10.add(record);
            Collections.sort(this.top10);
        } else if (record.count > this.top10.get(0).count){
            this.top10.add(record);
            Collections.sort(this.top10);
            this.top10.remove(0);
        }
    }
}

Next I use a simple loop to loop through all the keys (postcodes) and add them one at a time to the RecordList along with the companyCount.

RecordList recordList = new RecordList();
Set<String> postcodes = this.postcodeToCompaniesList.keySet();

for(String postcode : postcodes){
  int companyCount = this.postcodeToCompaniesList.get(postcode).size();
  recordList.addRecord(new Record(postcode, companyCount));
}

System.out.println(gson.toJson(recordList));

This works. However, I get the feeling that this is an very inefficent way to do this? Maybe Streams would work better??

What are the thoughts from the community? Is there a better, more elegent way to do this?

Adam Davies
  • 2,742
  • 4
  • 33
  • 52
  • 1
    Is it possible to do something similar but do it as the company is added to its list. At that point you know the new count; you then check that count against the min of top-10 and replace/insert as needed; if it's already in the top-10 then update its count and re-sort top-10. One optimization is if new count is less than min of top-10 then no further processing needed. – Computable Jan 09 '23 at 17:13
  • How is _**most popular postcodes**_ defined? Postcode with the highest count of companies? – Eritrean Jan 09 '23 at 17:16
  • @Eritrean, from the O/P code, Yes: The most popular postcodes are those with the highest count of companies. – Old Dog Programmer Jan 09 '23 at 17:27
  • 1
    It strikes me as odd to see code that looks like a sort is being called each iteration in a loop. But, with each invocation of sort, no more than 10 items will be sorted, and the sort will actually be called no more than 10 times. Even so, you might want to consider a [priority queue](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/PriorityQueue.html) or sorted collection. – Old Dog Programmer Jan 09 '23 at 17:37
  • If you want to do a web search, try "java find n largest elements in an array" or the same with "array" changed to "collection" https://en.wikipedia.org/wiki/Selection_algorithm https://stackoverflow.com/questions/7272534/finding-the-first-n-largest-elements-in-an-array – Old Dog Programmer Jan 09 '23 at 17:44
  • 1
    I saw the following comment to a different question. It might apply here: "Questions asking about improvements to correct, working code are better suited to CodeReview.SE" – Old Dog Programmer Jan 09 '23 at 17:47
  • Thanks Old Dog Programmer. I did try it out with a PriorityQueue, but it didn't make any significant differance to the execution time, although I do like the idea of using the priority queue OOTB functionlity. – Adam Davies Jan 09 '23 at 17:59
  • Out of curiosity, what if postcodeToCompaniesList had millions of entries, and you wanted to find the top thousand popular postcodes? What would be the difference in execution time, if any? – Old Dog Programmer Jan 10 '23 at 00:28

2 Answers2

1

I'm going to assume List of Company is a List of Strings to make things simpler. I think this should do it:

HashMap<String, Integer> counter = new HashMap<>(); // postcodes counter
postcodeToCompaniesList.forEach((postcode, company) -> counter.put(postcode, company.size()));

counter.entrySet().stream()
    .sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
    .limit(10)
    .forEach(System.out::println);
Flavian Iuga
  • 186
  • 5
  • Thanks, I will give this a go. I'm still getting to grips with the Stream API – Adam Davies Jan 09 '23 at 18:00
  • Hay Flavian, I don't understand the `.forEach(System.out::println);`. I understand the method passing to print them out, but how do I replace this in order to add each element into a List? Currently I have: ``` var recordList = new ArrayList(); .... .forEach(recordList.add(new Record(key, value))); ``` But that dosn't work. What does .ForEach pass in? – Adam Davies Jan 10 '23 at 10:00
  • Sussed it: .forEach(item -> recordList.add(new Record(item.getKey(), item.getValue()))); – Adam Davies Jan 10 '23 at 10:05
  • You don't have to print them, I think you can collect them like this: `List> result = counter.entrySet().stream() .sorted(Map.Entry.comparingByValue().reversed()) .limit(10) .collect(Collectors.toList());` – Flavian Iuga Jan 10 '23 at 15:44
1

You could stream over your entries sort them by the list size in reverse order, limit to n entries (in your case 10) and collect them in a LinkedHashMap:

private static Map<String, List<Company>> topN(Map<String, List<Company>> map, int n) {
    return map.entrySet()
              .stream()
              .sorted(Comparator.comparingInt((Map.Entry<String, List<Company>> e) -> e.getValue().size()).reversed())
              .limit(n)
              .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (x, y) -> y, LinkedHashMap::new));
}

and use it like below, for example to get the top 7 postcodes:

topN(postcodeToCompaniesList, 7)
        .forEach((k,v) -> System.out.printf("Postcode %s has %d companies.%n", k, v.size()));
Eritrean
  • 15,851
  • 3
  • 22
  • 28