4

This is a question is an extension of What's the most Pythonic way to identify consecutive duplicates in a list?.

Suppose you have a list of tuples:

my_list = [(1,4), (2,3), (3,2), (4,4), (5,2)]

and you sort it by each tuple's last value:

my_list = sorted(my_list, key=lambda tuple: tuple[1])
# [(3,2), (5,2), (2,3), (1,4), (4,4)]

then we have two consecutive runs (looking at the last value in each tuple), namely [(3,2), (5,2)] and [(1,4), (4,4)].

What is the pythonic way to reverse each run (not the tuples within), e.g.

reverse_runs(my_list)
# [(5,2), (3,2), (2,3), (4,4), (1,4)]

Is this possible to do within a generator?

UPDATE

It has come to my attention that perhaps the example list was not clear. So instead consider:

my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]

Where the ideal output from reverse_runs would be

[(7,"A"), (6,"A"), (1,"A"), (2,"B"), (3,"C"), (4,"C"), (5,"C"), (8,"D")]

To be clear on terminology, I am adopting the use of "run" as used in describing TimSort which is what Python's sort function is based upon - giving it (the sort function) its safety.

Thus if you sort on a collection, should the collection be multi-faceted, then only the specified dimension is sorted on and if two elements are the same for the specified dimension, their ordering will not be altered.

Thus the following function:

sorted(my_list,key=lambda t: t[1])

yields:

[(1, 'A'), (6, 'A'), (7, 'A'), (2, 'B'), (5, 'C'), (4, 'C'), (3, 'C'), (8, 'D')]

and the run on "C" (i.e. (5, 'C'), (4, 'C'), (3, 'C') ) is not disturbed.

So in conclusion the desired output from the yet to be defined function reverse_runs:

1.) sorts the tuples by their last element

2.) maintaining the order of the first element, reverses runs on the last element

Ideally I would like this in a generator functions, but that does not (to me at the moment) seem possible.

Thus one could adopt the following strategy:

1.) Sort the tuples by the last element via sorted(my_list, key=lambda tuple: tuple[1])

2.) Identify the indexes for the last element in each tuple when the succeeding tuple (i+1) is different than the last element in (i). i.e. identify runs

3.) Make an empty list

4.) Using the splice operator, obtain, reverse, and the append each sublist to the empty list

Community
  • 1
  • 1
SumNeuron
  • 4,850
  • 5
  • 39
  • 107

2 Answers2

4

I think this will work.

my_list = [(1,4), (2,3), (3,2), (4,4), (5,2)]
my_list = sorted(my_list, key=lambda tuple: (tuple[1], -tuple[0]))

print(my_list)

Output

[(5, 2), (3, 2), (2, 3), (4, 4), (1, 4)]

Misunderstood question. Less pretty but this should work for what you really want:

from itertools import groupby
from operator import itemgetter


def reverse_runs(l):
    sorted_list = sorted(l, key=itemgetter(1))
    reversed_groups = (reversed(list(g)) for _, g in groupby(sorted_list, key=itemgetter(1)))
    reversed_runs = [e for sublist in reversed_groups for e in sublist]

    return reversed_runs


if __name__ == '__main__':
    print(reverse_runs([(1, 4), (2, 3), (3, 2), (4, 4), (5, 2)]))
    print(reverse_runs([(1, "A"), (2, "B"), (5, "C"), (4, "C"), (3, "C"), (6, "A"), (7, "A"), (8, "D")]))

Output

[(5, 2), (3, 2), (2, 3), (4, 4), (1, 4)]
[(7, 'A'), (6, 'A'), (1, 'A'), (2, 'B'), (3, 'C'), (4, 'C'), (5, 'C'), (8, 'D')]

Generator version:

from itertools import groupby
from operator import itemgetter


def reverse_runs(l):
    sorted_list = sorted(l, key=itemgetter(1))
    reversed_groups = (reversed(list(g)) for _, g in groupby(sorted_list, key=itemgetter(1)))

    for group in reversed_groups:
        yield from group


if __name__ == '__main__':
    print(list(reverse_runs([(1, 4), (2, 3), (3, 2), (4, 4), (5, 2)])))
    print(list(reverse_runs([(1, "A"), (2, "B"), (5, "C"), (4, "C"), (3, "C"), (6, "A"), (7, "A"), (8, "D")])))
Tagc
  • 8,736
  • 7
  • 61
  • 114
  • Nice... Sort first by the second element, and *then* by the first element in each tuple. – blacksite Jan 27 '17 at 16:16
  • 1
    And then by the *negated value* of the first element. – Tagc Jan 27 '17 at 16:17
  • 1
    This may not work if `sorted(my_list, key=lambda t: t[0]) != my_list`. – David Eisenstat Jan 27 '17 at 16:23
  • This actually does not always work; consider: `my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"),(6,"A"),(7,"A"),(8,"D")]`. Simple sorting it preserves the non-ascending order of the run on `"C"` i.e. `sorted(my_list,key=lambda t: t[1]) #yields [(1, 'A'), (6, 'A'), (7, 'A'), (2, 'B'), (5, 'C'), (4, 'C'), (3, 'C'), (8, 'D')]`. Note that the code above generates `[(7, 'A'), (6, 'A'), (1, 'A'), (2, 'B'), (5, 'C'), (4, 'C'), (3, 'C'), (8, 'D')]` where it is clear that the run on "C" (e.g. 5, 4, 3) is not reversed from its original ordering. – SumNeuron Jan 27 '17 at 18:27
  • @SumNeuron Oh, I think I've misunderstood your question then. I thought you just wanted the second run to arrange in descending order of the first tuple element (i.e. the reverse of the default ascending order). In fact you want the runs to be unconditionally reversed regardless of how the first pass sorted them. – Tagc Jan 27 '17 at 18:37
  • @Tagc yea, that is my fault, it was a bit ambiguous. Please see the update :) – SumNeuron Jan 27 '17 at 18:41
  • @SumNeuron How about now? – Tagc Jan 27 '17 at 18:48
2

The most general case requires 2 sorts. The first sort is a reversed sort on the second criteria. The second sort is a forward sort on the first criteria:

pass1 = sorted(my_list, key=itemgetter(0), reverse=True)
result = sorted(pass1, key=itemgetter(1))

We can sort in multiple passes like this because python's sort algorithm is guaranteed to be stable.

However, in real life it's often possible to simply construct a more clever key function which allows the sorting to happen in one pass. This usually involves "negating" one of the values and relying on the fact that tuples order themselves lexicographically:

result = sorted(my_list, key=lambda t: (t[1], -t[0]))

In response to your update, it looks like the following might be a suitable solution:

from operator import itemgetter
from itertools import chain, groupby
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]

pass1 = sorted(my_list, key=itemgetter(1))
result = list(chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1))))
print(result)

We can take apart the expression:

chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1)))

to try to figure out what it's doing...

First, let's look at groupby(pass1, key=itemgetter(1)). groupby will yield 2-tuples. The first item (k) in the tuple is the "key" -- e.g. whatever was returned from itemgetter(1). The key isn't really important here after the grouping has taken place, so we don't use it. The second item (g -- for "group") is an iterable that yields consecutive values that have the same "key". This is exactly the items that you requested, however, they're in the order that they were in after sorting. You requested them in reverse order. In order to reverse an arbitrary iterable, we can construct a list from it and then reverse the list. e.g. reversed(list(g)). Finally, we need to paste those chunks back together again which is where chain.from_iterable comes in.

If we want to get more clever, we might do better from an algorithmic standpoint (assuming that the "key" for the bins is hashible). The trick is to bin the objects in a dictionary and then sort the bins. This means that we're potentially sorting a much shorter list than the original:

from collections import defaultdict, deque
from itertools import chain

my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]

bins = defaultdict(deque)
for t in my_list:
    bins[t[1]].appendleft(t)

print(list(chain.from_iterable(bins[key] for key in sorted(bins))))

Note that whether this does better than the first approach is very dependent on the initial data. Since TimSort is such a beautiful algorithm, if the data starts already grouped into bins, then this algorithm will likely not beat it (though, I'll leave it as an exercise for you to try...). However, if the data is well scattered (causing TimSort to behave more like MergeSort), then binning first will possibly make for a slight win.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Yes Python's sorting is built upon `TimSort` (hence this question's tag), whereby TimSort preserves "runs" when sorting; thereby giving us the ability to apply multiple sorts in succession to get a unique list. However this question can not simply rely on the built in sort function, as we wish to sort one (preserving runs) and then reverse those runs independently. – SumNeuron Jan 27 '17 at 18:29
  • @SumNeuron -- Specifically, `CPython` uses `TimSort`. Implementors are welcome to pick any algorithm they want as long as it's stable. But that's a nit. I'm not sure that I understand your statement "we with to sort one (preserving runs) and then reverse those runs independently". It sounds like you want to sort into ordered buckets and then sort the stuff in the buckets. With a stable sort, this can _always_ be accomplished by sorting everything by the latter criterion and then sorting again by the bucketing criterion. – mgilson Jan 27 '17 at 19:18
  • sorry, typo, "once". What you are describing is correct and that I am aware of for stable sorts. The issue is that each "bucket" needs to be reversed independently and in place i.e. if you had two buckets `[(1,2,3),(5,4,6)]` then they should become `[(3,2,1), (6,4,5)]`. Please see the updated example for a description of what is the desired outcome. – SumNeuron Jan 27 '17 at 21:00
  • @SumNeuron -- Ahh ... I think I understand now. I've updated. – mgilson Jan 27 '17 at 22:26
  • is there a way to do this without altering the default dictionary? – SumNeuron Jan 28 '17 at 09:39
  • I am marking this as the answer, but if you have the time, I would appreciate a thorough breakdown of how / why this work :) – SumNeuron Jan 30 '17 at 10:22
  • @SumNeuron -- It's somewhat explained in the text already. Which solution are you finding confusing? Maybe if I know what is difficult I can concentrate on that explanation to improve the answer. – mgilson Jan 30 '17 at 16:32
  • well I really do not want to change the defaultdict for starters.... but really this entire line is fairly confusing `list(chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1)))` – SumNeuron Feb 05 '17 at 17:02
  • @SumNeuron -- I still don't understand why you don't want to change the defaultdict. You _need_ to change the defaultdict because it starts off empty. Re: that confusing line, I've added a paragraph that takes it apart piece by piece and hopefully explains it. – mgilson Feb 07 '17 at 16:57