0

I have the following list of tuples already sorted, with "sorted" in python:

L = [("1","blaabal"),
     ("1.2","bbalab"),
     ("10","ejej"),
     ("11.1","aaua"),
     ("12.1","ehjej"),
     ("12.2 (c)", "ekeke"), 
     ("12.2 (d)", "qwerty"), 
     ("2.1","baala"),
     ("3","yuio"),
     ("4","poku"),
     ("5.2","qsdfg")]

My problem is as you can notice, at first it is good, though after "12.2 (d)" the list restart at "2.1",I don't how to solve this problem.

Thanks

moooeeeep
  • 31,622
  • 22
  • 98
  • 187
geilelou
  • 13
  • 3
  • What were you expecting? The numbers were sorted as strings so `'12...'` comes before `'2'` – Moses Koledoye Oct 04 '16 at 12:59
  • It's because they're sorted as strings lexicographically. – Ilja Everilä Oct 04 '16 at 12:59
  • That happens because your tuples contain strings and strings are sorted lexicographically. The answer by Suever will work, but ask yourself why they are strings to begin with. – L3viathan Oct 04 '16 at 12:59
  • 1
    Maybe this is even better duplicate: [Does Python have a built in function for string natural sort?](http://stackoverflow.com/q/4836710/1025391) – moooeeeep Oct 04 '16 at 13:00
  • Actually my main problem is dealing with "12.2 (d)", adding a third layer that cannot be turned into a string is complicated – geilelou Oct 04 '16 at 13:01
  • Some of the Strings also have letters and symbols in them i.e. "12.2 (d)" I think you need to clean up the data before trying to sort. How about you regex it extract just the numbers... Depends what's noise and what's signal in that mess – Nath Oct 04 '16 at 13:02
  • See andreypopp's answer in [Sorting a list of dot sparated numbers, like software versions](http://stackoverflow.com/q/2574080/4014959) – PM 2Ring Oct 04 '16 at 13:16

3 Answers3

6

There's a package made specifically for your case called natsort:

>>> from natsort import natsorted
>>> L = [('1', 'blaabal'), ('4', 'poku'), ('12.2 (c)', 'ekeke'), ('12.1', 'ehjej')]
>>> natsorted(L)
[('1', 'blaabal'), ('4', 'poku'), ('12.1', 'ehjej'), ('12.2 (c)', 'ekeke')]
skovorodkin
  • 9,394
  • 1
  • 39
  • 30
4

Since the first element in each tuple is a string, Python is performing lexographic sorting in which all strings that start with '1' come before strings that start with a '2'.

To get the sorting you desire, you'll want to treat the first entry as a float instead of a string.

We can use sorted along with a custom sorting function which converts the first entry to a float prior to sorting. It also keeps the second tuple element to handle the case when you may have non-unique first entries.

result = sorted(L, key = lambda x: (float(x[0].split()[0]), x[1]))

# [('1', 'blaabal'), ('1.2', 'bbalab'), ('2.1', 'baala'), ('3', 'yuio'), ('4', 'poku'), ('5.2', 'qsdfg'), ('10', 'ejej'), ('11.1', 'aaua'), ('12.1', 'ehjej'), ('12.2 (c)', 'ekeke'), ('12.2 (d)', 'qwerty')]

I had to add in a x[0].split()[0] so that we split the first tuple element at the space and only grab the first pieces since some have values such as '12.2 (d)' and we only want the '12.2'.

If the second part of that first element that we've discarded matters, then you could use a sorting function similar to the following which breaks that first element into pieces and converts just the first piece to a float and leaves the rest as strings.

def sorter(value):
    parts = value[0].split()

    # Convert the first part to a number and leave all other parts as strings
    parts[0] = float(parts[0]);

    return (parts, value[1])

result = sorted(L, key = sorter)
Suever
  • 64,497
  • 14
  • 82
  • 101
  • This won't work for some of the items. E.g `"12.2 (d)"` – Farhan.K Oct 04 '16 at 12:59
  • @Farhan.K Thanks updated. – Suever Oct 04 '16 at 13:00
  • Wouldn't it be better to just type: `sorted(L, key= lambda tup: float(tup[0].split()[0])` ? You should care only about the number as an identifier value. – Nf4r Oct 04 '16 at 13:02
  • @Nf4r In this case it would work since all of the first entries are unique. If they were not unique though, you'd likely want to use the second tuple element for sorting. – Suever Oct 04 '16 at 13:05
  • Well yea, it depends on the case, wheter u care about the 2nd item order or not. But seeing now "12.2 (c)" and "12.2 (d)" i think he do care about it, so your solution is fine. – Nf4r Oct 04 '16 at 13:09
  • Your solution ignores the second part of the splitted string. Given a stable sorting algorithm `12.2 (d)` might come before `12.2 (c)` using your current sort key, given it appears first in the input. – moooeeeep Oct 04 '16 at 13:19
  • @moooeeeep That's a good point. Depending on the data that could be problematic. You could modify this but really natural sorting would be ideal if that level of detail matters. It may be that the second element of the tuple's ordering would correspond with the `(c)` and `(d)` – Suever Oct 04 '16 at 13:29
  • Well, you could also add that second half of the splitted string to the sort key (between number and string). Given that the list looks like the scrambled outline of some document, maybe the ordering could ignore the second tuple elements altogether, as the first elements (number plus letter in parens) should be unique anyways. – moooeeeep Oct 04 '16 at 14:49
  • @moooeeeep Added an example of how that could be handled – Suever Oct 04 '16 at 14:55
-3

The first value of your tuples are strings, and are being sorted in lexicographic order. If you want them to remain strings, sort with

sorted(l, key = lambda x: float(x[0]))
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96