0

I am trying to find the fastest and most efficient way to get the first top K items in arrayList of objects based on a custom compareable implementation.

during my research some suggested that i should use Max/Min heap which is abstracted in java as PriorityQueue. However, the problem is I dont know how to implement that on an arrayList of objects

here is my Object instance

public class PropertyRecord {

    private long id;
    private String address, firstName, lastName, email, ownerAddress;
    private LocalDate dateSold;
    private BigDecimal price;


    public PropertyRecord(long id, String address, String firstName, String lastName, String email, String ownerAddress, LocalDate dateSold, BigDecimal price) {

        this.id = id;
        this.address = address;
        this.firstName = firstName;
        this.lastName = lastName;
        this.email = email;
        this.ownerAddress = ownerAddress;
        this.dateSold = dateSold;
        this.price = price;

    }
 //getters and setters...
}

i want to get the first top k items based on the price. I have written a method (see below) which takes the arrayList and K (get first K items) and used StreamAPI but i know it is not the most efficient way to do it because this will sort the whole list even though I want only the first K items. so instead of having an O(n) i want have O(k log n).

//return the top n properties based on sale price.
    public List<PropertyRecord> getTopProperties(List<PropertyRecord> properties, int n){

       //using StreamAPI
       return properties.stream()
               .sorted((p1, p2) -> p2.getPrice().compareTo(p1.getPrice()))
               .limit(n)
               .collect(Collectors.toList());

    }

Any Help Please?

Faisal Julaidan
  • 388
  • 4
  • 16
  • 1
    Can you just sort the list? and take top k or bottom k? – Sam Orozco Apr 09 '18 at 23:11
  • I can sort it but i don't want to sort the whole list. if i have 1000 items why sort the whole list even though i want the first top 10 lets say? – Faisal Julaidan Apr 09 '18 at 23:16
  • 2
    Sorting a subset of the list doesn't make any sense, as you don't know the maximum value, where it's located, etc. – Jacob G. Apr 09 '18 at 23:18
  • so what do you suggest? – Faisal Julaidan Apr 09 '18 at 23:19
  • Replace your `List` with an implementation of a heap like you were suggested. – Jacob G. Apr 09 '18 at 23:20
  • how? that is my problem how can implement that? any code example? – Faisal Julaidan Apr 09 '18 at 23:21
  • A quick Google search yields the following: https://stackoverflow.com/questions/14165325/is-there-a-heap-in-java – Jacob G. Apr 09 '18 at 23:23
  • It's not possible to have any algorithm faster than O(n) becasue every element must be inspected at least once. – Bohemian Apr 09 '18 at 23:49
  • What exactly is your question? How turn an ArrayList into a PriorityQueue? Why do you need the ArrayList so badly that you cannot just stick to using a PriorityQueue? – jxh Apr 10 '18 at 00:01
  • Also, consider whether or not this List is going to be read or written to more frequently. If you use a PriorityQueue, your reads for the top K should be constant, but your writes will be O(log(n)). However, if you use a typical List implementation, your writes (assuming you only add to the tail) will be constant, but your reads for top K will be O(n). Choose wisely. Profile your application to determine how much CPU time is spent where, or if it's even a performance issue at all. – whaley Apr 10 '18 at 00:05
  • my program has a lot of things to do with ArrayList such as search the ArrayList and get an item based on the ID, also get total amount of property prices in a specific date. So changing my data structure from ArrayList to PriorityQueue would ruin other functionalities! don't you think so ? or PriorityQueue would give me all of that more efficiently. it just i have never used PriorityQueue and that is why I am sticking with ArrayList not to mention that performance does matter in my program a lot. – Faisal Julaidan Apr 10 '18 at 00:52
  • I looked up for TreeMap and it seems suitable for my case except that I cannot sort it by value and only can sort it by keys. So i cannot have TreeMap sorted by PropertyRecord.getPrice(). any idea how to overcome this issue ? i want to keep the keys as long for fast retrieval and PropertyRecord as values! – Faisal Julaidan Apr 10 '18 at 23:27

1 Answers1

2

Guava contains a TopKSelector class that can do exactly this.

In the latest Guava version, this functionality is now exposed as Comparators.greatest().

However, if you're not locked into using an ArrayList for storage, you're probably better off using a PriorityQueue which will naturally keep the elements in priority order.

Daniel Pryden
  • 59,486
  • 16
  • 97
  • 135
  • well, my program has a lot of things to do with array list such as search the arraylist and get an item based on the ID, also get total amount of property prices in a specific date. So changing my data structure from ArrayList to PriorityQueue would ruin other functionalities! don't you think so ? – Faisal Julaidan Apr 10 '18 at 00:37
  • You're probably better off building a custom data storage object that exposes the operations you need, which underneath the hood could be implemented with multiple data structures. For your purposes, a heap (PriorityQueue) and hashtable (HashMap) combination sounds like a good choice. – Daniel Pryden Apr 10 '18 at 09:44
  • HashMap would be used for what in my case? searching? – Faisal Julaidan Apr 10 '18 at 16:11
  • Yes, lookup by (non-contiguous) ID is pretty much the canonical use case for a hashtable. – Daniel Pryden Apr 10 '18 at 17:21
  • After doing some research i ended up with LinkedHashMap data structure which will allow me to read my csv (each row represent a propertyRecord in the csv file) and while reading i will add these records to the LinkedHashMap sorted based on their price and each has the ID as their keys. So when I search for a record (property) i will look for its key only and no need for iteration. and at the same time i can get top k records fast and easy based on their prices since they are already sorted from the beginning. What do you think ? – Faisal Julaidan Apr 10 '18 at 17:39
  • 2
    If you don't need to worry about data changing over time, then sorting the data once is the fastest possible solution. The purpose of things like `PriorityQueue` is to allow you to insert and remove records efficiently while still allowing you to iterate them in priority order. In your case, `LinkedHashMap` is probably good enough, but you might also consider using a `TreeMap` which will sort using your comparator; there are trade-offs either way. If your data never changes then I would sort it once and then keep it in a Guava `ImmutableMap`. – Daniel Pryden Apr 10 '18 at 17:52
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/168670/discussion-between-faisal-julaidan-and-daniel-pryden). – Faisal Julaidan Apr 10 '18 at 18:12
  • I looked up for TreeMap and it seems suitable for my case except that I cannot sort it by value and only can sort it by keys. So i cannot have TreeMap sorted by PropertyRecord.getPrice(). any idea how to overcome this issue ? i want to keep the keys as long for fast retrieval and PropertyRecord as values! – Faisal Julaidan Apr 10 '18 at 23:26