4

My aim is to sort a list of strings where words have to be sorted alphabetically.Except words starting with "s" should be at the start of the list (they should be sorted as well), followed by the other words.

The below function does that for me.

def mysort(words):
    mylist1 = sorted([i for i in words if i[:1] == "s"])
    mylist2 = sorted([i for i in words if i[:1] != "s"])
    list = mylist1 + mylist2
    return list

I am just looking for alternative approaches to achieve this or if anyone can find any issues with the code above.

Nandhiya
  • 203
  • 2
  • 5
  • 22
misguided
  • 3,699
  • 21
  • 54
  • 96
  • Should the words starting with the letter s be sorted according to the same principle? Eg ['sss', 'ssa', 'ssb', 'sas', 'sab']? – blablatros Jul 12 '13 at 06:13
  • @blablatros nope only the first letter should be considered, so in your example ['sab','sas','ssa','ssb','sss'] – misguided Jul 14 '13 at 22:27

4 Answers4

11

You could do it in one line, with:

sorted(words, key=lambda x: 'a' + x if x.startswith('s') else 'b' + x)

The sorted() function takes a keyword argument key, which is used to translate the values in the list before comparisons are done.

For example:

sorted(words, key=str.lower)
    # Will do a sort that ignores the case, since instead
    # of checking 'A' vs. 'b' it will check str.lower('A')
    # vs. str.lower('b').

sorted(intlist, key=abs)
    # Will sort a list of integers by magnitude, regardless
    # of whether they're negative or positive:
    # >>> sorted([-5,2,1,-8], key=abs)
    #     [1, 2, -5, -8]

The trick I used translated strings like this when doing the sorting:

"hello" => "bhello"  
"steve" => "asteve"

And so "steve" would come before "hello" in the comparisons, since the comparisons are done with the a/b prefix.

Note that this only affects the keys used for comparisons, not the data items that come out of the sort.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
integer
  • 1,075
  • 7
  • 12
  • Apologies, I'm new to this.If possible can you explain the above expression in words – misguided Jul 12 '13 at 06:09
  • +1. Definitely better than mine, as it is easier and also handles the (improbable) case that a string starts with `\x00` or `\x01`. – glglgl Jul 12 '13 at 06:14
  • 1
    @Harold: you are proposing the decorate-sort-undecorate pattern that is exactly what `key=...` was introduced for. To me making a tuple instead of inserting an extra char at beginning is slightly better but the latter is not that terrible either. – 6502 Jul 12 '13 at 06:19
  • sorry what does 'a' and 'b' signify in the expression? – misguided Jul 12 '13 at 06:21
  • 1
    the only problem I have with this is `if x[:1] == 's'`. You could use `x[0]` but better is `if x.startswith('s')`. This is actually discussed specifically in PEP8 – Ryan Haining Jul 12 '13 at 06:46
  • @misguided he's just appending the letter `'a'` to the front of each string that starts with `'s'` and `'b'` to the front of everything else. That way when the strings are sorted, everything that he added an `'a'` to will be placed before everything else – Ryan Haining Jul 12 '13 at 06:50
  • @RyanHaining: `x[:1]` doesn't crash on empty strings while `x[0]` does. For checking just the first char `x[:1]` is faster than `startswith`. – 6502 Jul 12 '13 at 07:51
  • @RyanHaining: yeah, like 6502 said, the only reason I used x[:1] is because x[0] would throw on the empty string. x.startswith() is also fine. – integer Jul 12 '13 at 08:32
  • integer, made some formatting tweaks and used just the `startswith` choice. Makes little sense to say there are these two ways but the first way shouldn't be used. Better just to give the preferred one and not confuse newbies :-) – paxdiablo Jun 29 '21 at 07:23
5

1 . You can use generator expression inside sorted.

2 . You can use str.startswith.

3 . Don't use list as a variable name.

4 . Use key=str.lower in sorted.

mylist1 = sorted((i for i in words if i.startswith(("s","S"))),key=str.lower)
mylist2 = sorted((i for i in words if not i.startswith(("s","S"))),key=str.lower)
return mylist1 + mylist2

why str.lower?

>>> "abc" > "BCD"
True
>>> "abc" > "BCD".lower()  #fair comparison
False
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • Hi Ashwini does one have advantage over other ? i.e i.startswith(("s","S") faster / more efficient that i[:1] == "s" ? – misguided Jul 12 '13 at 05:59
  • @misguided can't say about speed, but it's the preferred to check the start of the string. Secondly it can also accept a tuple of strings. – Ashwini Chaudhary Jul 12 '13 at 06:02
  • I'd prefer w[0] == 's'. This would be a good choice for the generator expression, which shouldn't be so big. – gukoff Jul 12 '13 at 06:05
  • In theory `startswith` should be faster but in the past I found it to be much slower that just using `s[:1] in (...)`. May be the reason is that it does `s[len(x)] == x`... I didn't check the implementation. – 6502 Jul 12 '13 at 06:36
  • @6502 It could be slower, because it requires a function call. Even an empty function call takes some nano seconds. But it has it's advantages, suppose if you want to check multiple prefixes like `'s'`,`'sa'`,`'sab'`,etc then you're out of luck with slicing, but `str.startswith` will still work like a BOSS!. – Ashwini Chaudhary Jul 12 '13 at 06:40
  • @6502 Ah! found it. I remember Andy Hayden asked this question : http://stackoverflow.com/questions/13270888/why-is-startswith-slower-than-slicing – Ashwini Chaudhary Jul 12 '13 at 06:47
1
>>> l = ['z', 'a', 'b', 's', 'sa', 'sb', '', 'sz']
>>> sorted(l, key=lambda x:(x[0].replace('s','\x01').replace('S','\x01') if x else '') + x[1:])
['', 's', 'sa', 'sb', 'sz', 'a', 'b', 'z']

This key function replaces, for the purpose of sorting, every value starting with S or s with a \x01 which sorts before everything else.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • 2
    pointlessy complex... and what about if a string is already starting with `0x01`. What about `0x00`? – 6502 Jul 12 '13 at 06:08
  • @6502 It is more complex, but does everything in one run. And I suppose "string" means "readable text here", so I don't expect `0x00` and `0x01` to occur. Nevertheless, [integer's answer](http://stackoverflow.com/a/17608338/296974) is better. – glglgl Jul 12 '13 at 06:13
1

One the lines of Integer answer I like using a tuple slightly better because is cleaner and also more general (works for arbitrary elements, not just strings):

sorted(key=lambda x : ((1 if x[:1] in ("S", "s") else 2), x))

Explanation:

The key parameter allows sorting an array based on the values of f(item) instead of on the values of item where f is an arbitray function.

In this case the function is anonymous (lambda) and returns a tuple where the first element is the "group" you want your element to end up in (e.g. 1 if the string starts with an "s" and 2 otherwise).

Using a tuple works because tuple comparison is lexicographical on the elements and therefore in the sorting the group code will weight more than the element.

6502
  • 112,025
  • 15
  • 165
  • 265