13

I'm trying to implement a paging algorithm for a dataset sortable via many criteria. Unfortunately, while some of those criteria can be implemented at the database level, some must be done at the app level (we have to integrate with another data source). We have a paging (actually infinite scroll) requirement and are looking for a way to minimize the pain of sorting the entire dataset at the app level with every paging call.

What is the best way to do a partial sort, only sorting the part of the list that absolutely needs to be sorted? Is there an equivalent to C++'s std::partial_sort function available in the .NET libraries? How should I go about solving this problem?

EDIT: Here's an example of what I'm going for:

Let's say I need to get elements 21-40 of a 1000 element set, according to some sorting criteria. In order to speed up the sort, and since I have to go through the whole dataset every time anyway (this is a web service over HTTP, which is stateless), I don't need the whole dataset ordered. I only need elements 21-40 to be correctly ordered. It is sufficient to create 3 partitions: Elements 1-20, unsorted (but all less than element 21); elements 21-40, sorted; and elements 41-1000, unsorted (but all greater than element 40).

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
  • 2
    possible duplicate http://stackoverflow.com/questions/2540602/does-c-sharp-have-a-stdnth-element-equivalent – FlavorScape Mar 13 '13 at 20:08
  • Not at all-- that's a *selection* question, and this is a *partial sort* question. By all means, though, feel free to provide an answer on how this problem can be solved in terms of that problem, if that's possible. – Platinum Azure Mar 13 '13 at 20:10
  • When paging, if something was at the end of the list and by sorting it belonged at the beginning how would a partial sort work? Wouldn't any sort need to touch every element? – Jason Sperske Mar 13 '13 at 20:12
  • Was about to post the Array.Sort answer too, but as a more general querstion: If you use this to facilitate paging, wouldn't a partially sorted output be kind of confusing to the end user? I.e. how would a "new" record on page 2 behave that should go to page 1, according to order? – Roman Gruber Mar 13 '13 at 20:18
  • 1
    Can you clarify this a bit more? Are you saying if you have data like this: `9,8,7,6,5,4,3,2,1` and you say Top 3: you really want to get `1,2,3`. If the rest of the array is sorted, you don't care. Is that...sort of what you're after? – aquinas Mar 13 '13 at 20:23
  • @aquinas: That's a special case of what I'm after. What I really want is slightly more general: In your example, I want to be able to say "4th through 6th" and get something like: `3, 2, 1` (unsorted, but all less than proper 4th element); `4, 5, 6` (sorted and in the same place they would be for a sorted list); `8, 7, 9` (unsorted, but all greater than proper 6th element). As you can see, the Top 3 is a special case of that (and same with the Bottom 3, for that matter). – Platinum Azure Mar 13 '13 at 20:25
  • It's like a divide a conquer sort just stop dividing and conquering early once the range you are after is satisfied? – Jason Sperske Mar 13 '13 at 20:27
  • You could put it that way, sure. – Platinum Azure Mar 13 '13 at 20:27
  • why not take the values you have declared in the `var someCollection = new[]{5,2,7,9,0,6,8}` create an empty `int[] sortCollection = {}` then from there create a `var sortList = someCollection.ToList();` then sort the `sortList.Sort()` then do `sortCollection = sortList.ToArray();` now the sortCollection will have all your integer values sorted.. – MethodMan Mar 13 '13 at 20:31
  • @DJKRAZE That's a full sort, which I'm explicitly avoiding. – Platinum Azure Mar 13 '13 at 20:33
  • I misunderstood Platinum if needed I will delete my answer unless someone else may find it useful in the event that they need to do a full sort.. – MethodMan Mar 13 '13 at 20:37
  • why would you not store your data pre-sorted in the first place? you might look into a NoSQL solution to better optimize indexed data storage. – FlavorScape Mar 13 '13 at 21:38
  • FlavorScape: The data is partitioned into a SQL database (which we control) and a remote database accessible via REST interface, and we have no choice but to integrate them in the app layer. – Platinum Azure Mar 13 '13 at 22:29

3 Answers3

5

OK. Here's what I would try based on what you said in reply to my comment.

I want to be able to say "4th through 6th" and get something like: 3, 2, 1 (unsorted, but all less than proper 4th element); 4, 5, 6 (sorted and in the same place they would be for a sorted list); 8, 7, 9 (unsorted, but all greater than proper 6th element).

Lets add 10 to our list to make it easier: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.

So, what you could do is use the quick select algorithm to find the the ith and kth elements. In your case above i is 4 and k is 6. That will of course return the values 4 and 6. That's going to take two passes through your list. So, so far the runtime is O(2n) = O(n). The next part is easy, of course. We have lower and upper bounds on the data we care about. All we need to do is make another pass through our list looking for any element that is between our upper and lower bounds. If we find such an element we throw it into a new List. Finally, we then sort our List which contains only the ith through kth elements that we care about.

So, I believe the total runtime ends up being O(N) + O((k-i)lg(k-i))

static void Main(string[] args) {
    //create an array of 10 million items that are randomly ordered
    var list = Enumerable.Range(1, 10000000).OrderBy(x => Guid.NewGuid()).ToList();

    var sw = Stopwatch.StartNew();
    var slowOrder = list.OrderBy(x => x).Skip(10).Take(10).ToList();
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);
    //Took ~8 seconds on my machine

    sw.Restart();
    var smallVal = Quickselect(list, 11);
    var largeVal = Quickselect(list, 20);
    var elements = list.Where(el => el >= smallVal && el <= largeVal).OrderBy(el => el);
    Console.WriteLine(sw.ElapsedMilliseconds);
    //Took ~1 second on my machine
}

public static T Quickselect<T>(IList<T> list , int k) where T : IComparable {
    Random rand = new Random();
    int r = rand.Next(0, list.Count);
    T pivot = list[r];
    List<T> smaller = new List<T>();
    List<T> larger = new List<T>();
    foreach (T element in list) {
        var comparison = element.CompareTo(pivot);
        if (comparison == -1) {
            smaller.Add(element);
        }
        else if (comparison == 1) {
            larger.Add(element);
        }
    }

    if (k <= smaller.Count) {
        return Quickselect(smaller, k);
    }
    else if (k > list.Count - larger.Count) {
        return Quickselect(larger, k - (list.Count - larger.Count));
    }
    else {
        return pivot;
    }
}
aquinas
  • 23,318
  • 5
  • 58
  • 81
  • Thanks-- this does look promising. I'll give it a try! – Platinum Azure Mar 14 '13 at 22:24
  • Doesn't this approach assume that values aren't repeating? Suppose every value in the list is 4(smallVal) or 5(largeVal). Also aren't you assuming that the largeVal you are after is predictable? Maybe this is a weakness with `std::partial_sort` as well (this stuff is WAY over my head at the moment). – Jason Sperske Mar 14 '13 at 22:35
  • @aquinas Unfortunately, the link is now dead :-( – Platinum Azure Jan 24 '17 at 07:26
  • @PlatinumAzure, I love that you came back to this answer 4 years after you asked the question :) Fixed link. – aquinas Mar 20 '17 at 02:36
-1

You can use List<T>.Sort(int, int, IComparer<T>):

inputList.Sort(startIndex, count, Comparer<T>.Default);
Lee
  • 142,018
  • 20
  • 234
  • 287
  • That implies that the elements I want are all located contiguously, something that cannot be guaranteed on an unordered dataset. Once the partitioning I have described above is completed, this method (or the `Array.Sort` answer provided by Dai) will prove useful. – Platinum Azure Mar 13 '13 at 20:23
-2

Array.Sort() has an overload that accepts index and length arguments that lets you sort a subset of an array. The same exists for List.

You cannot sort an IEnumerable directly, of course.

Dai
  • 141,631
  • 28
  • 261
  • 374
  • That implies that the elements I want are all located contiguously, something that cannot be guaranteed on an unordered dataset. Once the partitioning I have described above is completed, this method (or the `List.Sort(int, int, IComparer)` answer provided by Lee) will prove useful. – Platinum Azure Mar 13 '13 at 20:23