1

I have hundreds of lists that look like that:

list_1 = [10,20,30,40,70,90,230,450] # sorted list example
list_2 = [10,20,40,30,70,230,90,450] # partially sorted list example
list_a = [20,450,90,30,70,230,10,40] # messy/unsorted list example

Some lists are sorted, partially sorted and some are completly unsorted. I want to get the lists that are sorted and partially sorted. I know how to check if a list is sorted:

if sorted(lst) == lst:
    # code here

How do I get the partially sorted lists as well? How to define a threshold for what I mean with partially sorted? For example only 10% of numbers in a given list are not in correct order.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 1
    how exactly do you define "partially sorted" – Ohad Sharet Aug 06 '22 at 16:51
  • For example only 4 number swaps needed to achieve sorted state. Some numbers are in the correct position and some not. Or some numbers are not far away from their correct position (correct position = position if the list is sorted) – Jonas Gross Aug 06 '22 at 16:53
  • Does this answer your question? [Is there a way to measure how sorted a list is?](https://stackoverflow.com/questions/16994668/is-there-a-way-to-measure-how-sorted-a-list-is) – Dave Aug 07 '22 at 13:12

2 Answers2

2

I'd try to define it as

Partially sorted list has at most f(n) total distance between elements and their expected location.

The function f(n) itself should be defined according to your use case, and how "expensive" are out of orders.

Now, using this definition, simply sum the values of distances between each element and its expected location.

def is_partially_sorted(l):
  sorted_l = sorted(l)
  # binary_search() is doing binary search in sorted list to find x, and returns its index.
  diffs = [abs(idx - binary_search(sorted_l, x) for idx, x in enumerate(l)]  
  # f(x) is the threshold function described above.
  return sum(diffs) <= f(len(l))

(Not a native python coder, so hope this is sufficiently readable)

amit
  • 175,853
  • 27
  • 231
  • 333
  • That sounds like exactly what I want. But I cannot yet figure out the implementation. Still an amateur-python-programmer. Not sure how to implement line 4 in your code. – Jonas Gross Aug 06 '22 at 17:41
  • @JonasGross I am not following, line 4 is an implementation :) - or do you refer to implement binary_search. If the latter - there are plenty of sourcecs on that online. – amit Aug 06 '22 at 17:45
  • Ok, got it. Sorry, I am a bit slow. So the diffs-variable in your code is the sum of offsets for all elements in an unsorted list? – Jonas Gross Aug 06 '22 at 18:01
  • @JonasGross Not the sum, the distances of each element. `diffs[i]` stands for the distance between `l[i]`, to where it should have been if it was sorted (for example `diffs[4] == 0` if 5th element in place, and `diffs[9] == 1` if the 9th element is one place ahead). I later sum it in `sum(diffs)`. – amit Aug 06 '22 at 18:04
0

An inversion is a pair of elements that are out of order. I.e. indices i < j, but arr[i] > arr[j].

A reasonable metric of sortedness is the count of inversions, or normalized as the ratio of the count of inversions to the max possible inversions, which is choose(n,2).

You can find it in O(n log n) time with modified merge sort, as explained here.

Dave
  • 7,460
  • 3
  • 26
  • 39