-1

I must implement sorting the list of strings in a way which is much similar to sorted function, but with one important distinction. As you know, the sorted function accounts space character prior digits character, so sorted(['1 ', ' 9']) will give us [' 9', '1 ']. I need sorted that accounts digit character prior space chars, so in our example the result will be ['1 ', ' 9'].

Update

As I understand, by default the sorted behaviour relies on the order of chars in ascii 'alphabet' (i.e. ''.join([chr(i) for i in range(59, 127)])), so I decided to implement my own ascii 'alphabet' in the my_ord function.

I planned to use this function in junction with simple my_sort function as a key for sorted,

def my_ord(c):
    punctuation1 = ''.join([chr(i) for i in range(32, 48)])
    other_stuff = ''.join([chr(i) for i in range(59, 127)])
    my_alphabet = string.digits + punctuation1 + other_stuff
    return my_alphabet.find(c)

def my_sort(w):
    return sorted(w, key=my_ord)

like this: sorted([' 1 ', 'abc', ' zz zz', '9 '], key=my_sort).

What I'm expecting in this case, is ['9 ', ' 1 ', ' zz zz', 'abc']. Unfortunately, the result not only doesn't match the expected - moreover, it differs from time to time.

halfer
  • 19,824
  • 17
  • 99
  • 186
mr_bulrathi
  • 514
  • 7
  • 23
  • The key changes how elements are compared, but it won't change the values in the result. If the input contains `' 9'` you can't get `'9 '` in the result. – Barmar May 24 '18 at 18:09
  • In addition to changing sort order in your example, you also moved spaces to the end. Was that intentional? If that is a requirement, it would be best to point that out. – tdelaney May 24 '18 at 18:29
  • Can the numbers be multiple digits? If so, how are they sorted? Should `5` precede `11` (lexical sort) or the other way around (numerical sort)? – tdelaney May 24 '18 at 18:32
  • stackoverflow sent me this suggesting closing, chose to edit it to better shape... I took the liberty of also supposing that moving the spaces to end was a mistake, if not, the answer would be the same, only instead of the strip function, you would need to define your own, slightly modified strip... – ntg May 24 '18 at 18:56
  • I rolled back @ntg's changes. We encourage people to show how they've attempted to solve the problem and we certainly shouldn't change the author's expected result without confirmation from the author first. – tdelaney May 24 '18 at 19:13
  • I see. I liked my title better and would still fix the ' 9' typo though (unless there is some indication it was not a typo?) Last comment: I am not sure I like this question,and if I unedited, I would close. I do like the answer (+1) though. – ntg May 24 '18 at 19:41
  • @Barmar, I don't need to change the elements of the list. I need to sort the elements – mr_bulrathi May 24 '18 at 19:54
  • You said that you want to get an answer with `'9 '` in it, that's a change from `' 9'` in the input. – Barmar May 24 '18 at 19:55
  • @tdelaney, yes, this example shows the desired result – mr_bulrathi May 24 '18 at 19:55
  • @tdelaney: '@Barmar, I don't need to change the elements of the list. I need to sort the elements – mr_bulrathi' .... – ntg May 24 '18 at 20:03
  • @tdelaney, yes, as I said, the desired behavior must be similar to original `sorted` except that digit chars must have be accounted prior to space chars. Lexical sort - for chars, numerical sort - for digits, all as in `sorted` (with the exception that I've already mentioned) – mr_bulrathi May 24 '18 at 20:05
  • @ntg, sorry, didn't mentioned that elements in resulting sorted list must not be changed – mr_bulrathi May 24 '18 at 20:06
  • You say the example shows the desired result but that the elements must not be changed. But the example shows that the elements are changed. If your data includes other characters or multiple digits it would be helpful to include a couple of samples and desired results. Make it easy for us to test our suggestions by pasting your desired result into our code for validation. – tdelaney May 24 '18 at 20:10
  • @tdelaney, yep, that was a typo, my bad, posted this question in the subway, vibration + late evening :( – mr_bulrathi May 24 '18 at 20:11
  • @mr_bulrathi I supposed it was this. The antiphasis comes from the fact that your example changes one of the list elements from space_nine to nine_space, so in your example your list is changed. Fixed it but was reverted... I would also suggest changing the tite: what you want to do is called 'whitespace stripping', so I would go with something like 'Sort list of strings by whitespace stripped key' – ntg May 24 '18 at 20:12
  • @mr_burathi last comment: You should checkout the answer and if it works, accept it. It is probably a bit better than the answer I would give you (which would involve https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html) – ntg May 24 '18 at 20:20
  • @ntg, your idea sounds smart, i’ll check it tomorrow (it’s 23:20 PM in my city, gnite my friends) – mr_bulrathi May 24 '18 at 20:25
  • @mr_bulrathi Thanks though in this case nuric 's seems smarter and simpler ;) – ntg May 24 '18 at 20:29

4 Answers4

2

You can use lstrip as the key function to ignore the whitespace on the left, front of the string.

r = sorted(['1 ', ' 9' , ' 4', '2 '], key=str.lstrip)
# r == ['1 ', '2 ', ' 4', ' 9']

key specifies a function of one argument that is used to extract a comparison key from each list element, doc.

nuric
  • 11,027
  • 3
  • 27
  • 42
  • this won't help in this case: `['1 a', '11a']`. The output will be `['1 a', '11a']`. According to description (i.e. digits must be prior to spaces), the result must be `['11a', '1 a', ]` – mr_bulrathi May 24 '18 at 19:49
1

Try this

import string
MY_ALPHABET = (
        string.digits
        + ''.join([chr(i) for i in range(32, 127) if chr(i) not in string.digits])
)
inp = [' 1 ', 'abc', ' zz zz', '9 ', 'a 1', 'a ']
print(inp, '-->', sorted(inp, key=lambda w: [MY_ALPHABET.index(c) for c in w]))
0

You want a combination of lexical and numerical sorting. You can do that by chopping up the string into a tuple and converting the digits to int. Now the tuple compare will consider each element by its own comparison rules.

I've used regex to split the string into (beginning text, white space, the digits, everything else) created an int and used that for the key. if the string didn't match the pattern, it just returns the original string in a tuple so that it can be used for comparison also.

I moved the whitespace before the digit (group(2)) after the digit but it may make more sense to leave it out of the comparison completely.

import re

test = ['1  ', ' 9']
wanted = ['1  ', ' 9']

def sort_key(val):
    """Return tuple of (text, int, spaces, remainder) or just
    (text) suitable for sorting text lexagraphically but embedded
    number numerically"""
    m = re.match(r"(.*?)(\s*)(\d+)(.*)", val)
    if m:
        return (m.group(1), int(m.group(3)), m.group(2), m.group(4))
    else:
        return (val,)

result = sorted(test, key=sort_key)
print(test, '-->', result)
assert result == wanted, "results compare"
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Let's try it on the following input: `test = [' 1 ', 'abc', ' zz zz', '9 ']`. The `wanted`, in this case, will be the `['9 ', ' 1 ', ' zz zz', 'abc']`. But alas, the `result` is `[' 1 ', '9 ', ' zz zz', 'abc']` which is not the expected result :( – mr_bulrathi May 25 '18 at 09:45
  • Please read what you wrote, you have a copy paste error.... But furthermore, can you explain why ' zz zz' should be before 'abc' based on your description? – ntg May 25 '18 at 11:22
0

For completeness and maybe efficiency in extreme cases, here is a solution using numpy argsort:

import numpy as np
lst = ['1 ', ' 9' , ' 4', '2 ']
order = np.argsort(np.array([s.lstrip() for s in lst]))
result = list(np.array(lst)[order])

Overall, I think that using sorted(..., key=...) is generally superior and this solution makes more sense if the input is already a numpy array. On the other hand, it uses strip() only once per item and makes use of numpy, so it is possible that for large enough lists, it could be faster. Additionally, it produces order, whitch shows where each sorted element was in the original list.

As a last comment, from the code you provide, but not the example you give, I am not sure if you just want to strip the leading white spaces, or do more, e.g. best-way-to-strip-punctuation-from-a-string-in-python, or first order on the string without punctuatation and then if they are equal, order on the rest (solution by tdelaney) In any case it might not be a bad idea to compile a pattern, e.g.

import numpy as np
import re
pattern = re.compile(r'[^\w]')
lst = ['1 ', ' 9' , ' 4', '2 ']
order = np.argsort(np.array([pattern.sub('',s) for s in lst]))
result = list(np.array(lst)[order])

or:

import re
pattern = re.compile(r'[^\w]')
r = sorted(['1 ', ' 9' , ' 4', '2 '], key= lambda s: pattern.sub('',s))
ntg
  • 12,950
  • 7
  • 74
  • 95
  • Thanks for your effort, but the desired result won't be reached. For example, sorted `[' 1 ', 'abc', ' zz zz', '9 ']`, according description, must be `['9 ', ' 1 ', ' zz zz', 'abc']`, but the provided solution will give `[' 1 ', '9 ', 'abc', ' zz zz']`. Anyway, thanks for your time – mr_bulrathi May 25 '18 at 08:43