You have the option of just skipping all the following text to FINALLY and use the provided code for sorting Python lists of strings like they would be sorted in R or learn a bit about Python reading the answer from top to bottom:
Like already mentioned in the comment to your question by Rawson (giving appropriate helpful link) you can define the order in which sorting should take place for any characters you choose to take out of the usual sorting order:
t = ['1&2', '1_2']
print(sorted(t))
alphabet = {"_":-2, "&":-1}
def sortkey(word):
return [ alphabet.get(chr, ord(chr)) for chr in word ]
# what means:
# return [ alphabet[chr] if chr in alphabet else ord(chr) for chr in word ]
print(sortkey(t[0]), sortkey(t[1]))
print(sorted(t, key=sortkey))
gives:
['1&2', '1_2']
[49, -1, 50] [49, -2, 50]
['1_2', '1&2']
Use negative values to define the alphabet
order so you can use ord()
for any other not redefined parts of the alphabet (advantage: avoiding possible problems with Unicode strings).
If you want to redefine many of the characters and use only the printable ones you can also define an own alphabet string like follows:
# v v
alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%_'()*+,-./:;<=>?@[\]^&`{|}~"""
and then use to sort by it:
print(sorted(t, key=lambda s: [alphabet.index(c) for c in s]))
For extended use on a huge number of data to sort consider to turn the alphabet to a dictionary:
dict_alphabet = { alphabet[i]:i for i in range(len(alphabet)) }
print(sorted(t, key=lambda s: [dict_alphabet[c] for c in s ]))
or best use the in Python available character translation feature available for strings:
alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%_'()*+,-./:;<=>?@[\]^&`{|}~"""
table = str.maketrans(alphabet, ''.join(sorted(alphabet)))
print(sorted(t, key=lambda s: s.translate(table)))
By the way: you can get a list of printable Python characters using the string
module:
import string
print(string.printable) # includes Form-Feed, Tab, VT, ...
FINALLY
Below ready to use Python code for sorting lists of strings exactly like they would be sorted in R:
Rcode = """\
s <- "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!#$%&()*+,-./:;<=>?@[\\]^_`{|}~"
paste(sort(unlist(strsplit(s, ""))), collapse = "")"""
RsortOrder = "_-,;:!?.()[]{}@*/\\&#%`^+<=>|~$0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"
# ^--- result of running the R-code online ( [TIO][1] )
# print(''.join(sorted("_-,;:!?.()[]{}@*/\\&#%`^+<=>|~$0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ")))
PythonSort = "!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
# ===========================================
alphabet = RsortOrder
table = str.maketrans(alphabet, ''.join(sorted(alphabet)))
print(">>>",sorted(["1&2","1_2"], key=lambda s: s.translate(table)))
printing
>>> ['1_2', '1&2']
Run the R-code online using: TIO or generate your own RsortOrder
running the provided R-code and using your specific locale setting in R as suggested in the comments to your question by juanpa.arrivillaga .
Alternatively you can use the Python locale
module for the purpose of usage of the same locale setting as it is used in R:
( https://stackoverflow.com/questions/1097908/how-do-i-sort-unicode-strings-alphabetically-in-python )
import locale
# this reads the environment and inits the right locale
locale.setlocale(locale.LC_ALL, "")
# locale.strxfrm(string)
# Transforms a string to one that can be used in locale-aware comparisons.
# For example, strxfrm(s1) < strxfrm(s2) is equivalent to strcoll(s1, s2) < 0.
# This function can be used when the same string is compared repeatedly,
# e.g. when collating a sequence of strings.
print("###",sorted(["1&2","1_2"], key=locale.strxfrm))
prints
### ['1_2', '1&2']