1

How can I sort the values and place it in the array in descending order directly in the for loop itself rather than using the sorted function after the for loop?

final_lst = []
for row in data:
    score = function_returns_a_score()
    final_lst.append({"score": score})

print(final_lst)
# returns 
# [{"score": 10}, {"score": 40}, {"score": 90}, {"score": 15}]

print(sorted(final_lst, key=lambda k: k['score'], reverse=True))
# returns
# [{'score': 90}, {'score': 40}, {'score': 15}, {'score': 10}]
Prashant Sengar
  • 506
  • 1
  • 7
  • 24
user_12
  • 1,778
  • 7
  • 31
  • 72

3 Answers3

2

You could use a heap queue:

import random
import heapq

final_list = []
for score in random.sample(range(100), 20): # in random order
    # negative because we're sorting in reverse order
    heapq.heappush(final_list, -score)

final_list = [{'score': -heapq.heappop(final_list)} for _ in range(len(final_list))]

Sample result:

[{'score': 95}, {'score': 94}, {'score': 89}, {'score': 72}, {'score': 71}, {'score': 65}, {'score': 60}, {'score': 58}, {'score': 51}, {'score': 50}, {'score': 45}, {'score': 44}, {'score': 36}, {'score': 35}, {'score': 33}, {'score': 26}, {'score': 25}, {'score': 18}, {'score': 6}, {'score': 3}]

I'm not sure that this would have better complexity than sorting, but it lets you extract the data in sorted order whenever you want: you can call heapq.heappop(final_list) when you need the next value - sorting, on the contrary, is done here and now.


Also, iff your scores are fixed-width integers (say, integers from 0 to 100), you could use the radix sort which would be O(3n) in this case.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
  • Is there any performance benefit of using this method? I have huge array with millions of records. The current method was taking a lot of time for me. – user_12 Nov 13 '20 at 14:34
  • "but it lets you extract the data in sorted order whenever you want". The structural property of a heap that allows finding the maximum to be efficient is generated during insertion, which means all the work is still being done before retrieving the elements. There is very little offloading of work. – Aplet123 Nov 13 '20 at 14:36
  • @Aplet123, yeah, you end up with `O(n log n)` complexity anyways. Maybe OP could ditch sorted lists entirely and just use a heap if the _main_ operation is inserting data (because inserts are about `O(log n)` and could be reduced to `O(1)`). – ForceBru Nov 13 '20 at 14:43
1

You could create a reverse bisect function to get the correct index for insertion.

Borrowing the code from this answer

def reverse_bisect(a, x, lo=0, hi=None):
    """Return the index at which x could be inserted in a assuming a
    is reverse-sorted.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if x > a[mid]: hi = mid
        else: lo = mid+1
    return lo

scores = []
for row in data:
    score = function_returns_a_score()
    idx = reverse_bisect(scores, score) # this is the correct index to insert the value
    scores.insert(idx, score)
final_lst = [{"score": score} for score in scores]

Complexity Analysis: (taking N = number of elements in array)

Sorting takes O(NlogN) time. Binary search has a time complexity of O(logN). Calling it N times for each element makes it run O(NlogN) times again. Over that, N is smaller in the beginning.

Space Complexity: O(N) You are taking extra space here to store the values, so you may have to take that into account as well.


The answer below is for someone who wants to insert a value in "normal" (not reversed) order.

To insert the score at the right index, you need to know the correct index first. If the list is sorted (an empty lit is sorted), we can use Binary Search to find the correct index to insert an element.

You could use the bisect module to find the index at which you need to insert your score.

import bisect

final_lst = []
scores = []
for row in data:
    score = function_returns_a_score()
    idx = bisect.bisect(scores, score) # this is the correct index to insert the value
    final_lst.insert(idx, {"score": score})
    scores.insert(idx, score)
Prashant Sengar
  • 506
  • 1
  • 7
  • 24
  • I am working on huge array with millions of records and I don't know much about bisect? Do you know if its faster than sorting after the loop? – user_12 Nov 13 '20 at 14:39
  • `bisect` finds the index in `logN` time. So for an array of N elements it takes `O(NlogN)` time complexity, the same as sorting it afterwards. Also, in the beginning, `N` is smaller. – Prashant Sengar Nov 13 '20 at 14:41
0

You can just sort the array in place after the loop:

final_lst = []
for row in data:
    score = function_returns_a_score()
    final_lst.append({"score": row})

final_list.sort(key=lambda k: k["score"], reverse=True)

print(final_lst)
# returns
# [{'score': 90}, {'score': 40}, {'score': 15}, {'score': 10}]

If you really want to maintain a sorted list for whatever reason, then look into using a PriorityQueue with a class wrapping your objects for a custom comparison function.

Aplet123
  • 33,825
  • 1
  • 29
  • 55
  • I was worried about the time and memory it takes, if we do it after the loop. I have a huge array like millions of records – user_12 Nov 13 '20 at 14:30
  • 1
    Priority queues/red-black trees have the same complexity as sorting the array after the loop. They're both O(n\*log(n)), which is *basically* O(n) due to log growing so slowly. – Aplet123 Nov 13 '20 at 14:31