0

I have a data structure such that:

   [(1, '(A) Begin plans for world domination'), 
    (2, 'Listen to Symphony for the New World'), 
    (3, '(A) Change world'), 
    (4, '(D) Listen to Symphony for the New World'), 
    (5, '(C) Hello, World!'), (4, 'Improve Python todo project'), 
    (6, 'Begin plans for world domination'), 
    (7, '(A) Improve Python todo project')]

And I would like to sort first by the contents of the parentheses (if they have parentheses), and then by number.

I'm currently filtering for keywords which works fine, my sort is incorrect:

        filtered_items = [
            (i+1, line.strip()) for i, line in enumerate(items)
            if not args.filter_string or args.filter_string.lower() in line.lower()
        ]

        sorted_items = sorted(filtered_items, key=lambda item: (
            not item[1].startswith('('),
            item[1][:3],
            item[0],
        ))

this current sort will yield the output:

1 (A) Begin plans for world domination
3 (A) Change world
7 (A) Improve Python todo project
5 (C) Hello, World!'), (4, 'Improve Python todo project
4 (D) Listen to Symphony for the New World
6 Begin plans for world domination
2 Listen to Symphony for the New World

The last 2 are the wrong way round.

Can anyone explain why this is?

  • Your sort is always based on the first 3 characters of the second item in the tuple. If that is not enough to get you going again let me know and I will give you a working solution – JonSG May 17 '23 at 14:49
  • Please explain "5 (C) Hello, World!'), (4, 'Improve Python todo project" in the output – DarkKnight May 17 '23 at 15:13

4 Answers4

2

Your key tuple is (Whether there is parenthesis, first three letters, item number), so when there is no parenthesis, it sorts by alphabetical order based on the first three letters. You want your second tuple to be conditional based on presence of parenthesis:

sorted_items = sorted(
    filtered_items,
    key=lambda item: (not item[1].startswith("("), item[1][0:3] if item[1].startswith("(") else item[0]),
)
Michael Cao
  • 2,278
  • 1
  • 1
  • 13
  • Your code removes the integer from the sort key for items with a parenthesized letter. It happens to still work for the given input, since `filtered_items` is constructed in such a way that the integers are already ascending, so the [stability](https://stackoverflow.com/questions/1517793/what-is-stability-in-sorting-algorithms-and-why-is-it-important) of Python's sort means having the integer in the sort key isn't actually necessary, but it'll lead to [wrong results](https://ideone.com/R780tM) if the input can be something like `[(2, '(A) 2'), (1, '(A) 1')]`. – user2357112 May 17 '23 at 14:57
1

You're using the first 3 characters of the string as part of the sort key, even if the string doesn't start with a parenthesis. The last 2 items are sorted that way because 'Beg' comes before 'Lis'.

user2357112
  • 260,549
  • 28
  • 431
  • 505
1

Your key tuple needs to be conditional. The way it's constructed right now the first three letters of the string are higher priority than the number even if there are no parentheses. Try this:

key=lambda item: (0, item[1][1], item[0]) if item[1].startswith("(") else (1, item[0])
tzaman
  • 46,925
  • 11
  • 90
  • 115
  • Your code removes the integer from the sort key for items with a parenthesized letter. It happens to still work for the given input, since `filtered_items` is constructed in such a way that the integers are already ascending, so the [stability](https://stackoverflow.com/questions/1517793/what-is-stability-in-sorting-algorithms-and-why-is-it-important) of Python's sort means having the integer in the sort key isn't actually necessary, but it'll lead to [wrong results](https://ideone.com/W65AxK) if the input can be something like `[(2, '(A) 2'), (1, '(A) 1')]`. – user2357112 May 17 '23 at 14:59
  • 1
    @user2357112 good catch, added that back in. – tzaman May 17 '23 at 15:01
0

Here's another way to do this.

Caveat is that the value in parentheses is a single uppercase letter

data = [
    (1, '(A) Begin plans for world domination'), 
    (2, 'Listen to Symphony for the New World'), 
    (3, '(A) Change world'), 
    (4, '(D) Listen to Symphony for the New World'), 
    (5, '(C) Hello, World!'),
    (4, 'Improve Python todo project'), 
    (6, 'Begin plans for world domination'), 
    (7, '(A) Improve Python todo project')]

def k(t):
    i, s = t
    return ord(s[1]) if s[0] == '(' else i + ord('Z'), s

print(*sorted(data, key=k), sep='\n')

Output:

(1, '(A) Begin plans for world domination')
(3, '(A) Change world')
(7, '(A) Improve Python todo project')
(5, '(C) Hello, World!')
(4, '(D) Listen to Symphony for the New World')
(2, 'Listen to Symphony for the New World')
(4, 'Improve Python todo project')
(6, 'Begin plans for world domination')

...which, based on the description in the question, is what I believe the output should look like

DarkKnight
  • 19,739
  • 3
  • 6
  • 22