split on multiple characters in string

Question

I have a list of filenames that I need to sort based on a section within the string. However, it only works if I make the file extension part of my sorting dictionary. I want this to work if the file is a .jpg or a .png, so I am trying to split on both the '_' and the '.' character.

sorting = ['FRONT', 'BACK', 'LEFT', 'RIGHT', 'INGREDIENTS', 'INSTRUCTIONS', 'INFO', 'NUTRITION', 'PRODUCT']

filelist = ['3006345_2234661_ENG_PRODUCT.jpg', '3006345_2234661_ENG_FRONT.jpg', '3006345_2234661_ENG_LEFT.jpg', '3006345_2234661_ENG_RIGHT.jpg', '3006345_2234661_ENG_BACK.jpg', '3006345_2234661_ENG_INGREDIENTS.jpg', '3006345_2234661_ENG_NUTRITION.jpg', '3006345_2234661_ENG_INSTRUCTIONS.jpg', '3006345_2234661_ENG_INFO.jpg']

sort = sorted(filelist, key = lambda x : sorting.index(x.re.split('_|.')[3]))

print(sort)

This returns the error "AttributeError: 'str' object has no attribute 're'"

What do I need to do to split on both the _ and . when splitting out my strings for sorting? I only want to use the split for the sorting, not for re-forming the strings.

Does this answer your question? [Split string based on a regular expression](https://stackoverflow.com/questions/10974932/split-string-based-on-a-regular-expression) — Michael Bianconi, Feb 14 '20 at 21:49
That gives an error "IndexError: list index out of range" which I figure means that it isn't making enough splits to get to the [3] index. — Micah Edelblut, Feb 14 '20 at 21:50
`re.split` takes position one argument as regular expression, second as data input so your syntax should be something like `re.split('_|.', x)[3]`, as mentioned in the comment below. — Guven Degirmenci, Feb 14 '20 at 21:54
_"AttributeError: 'str' object has no attribute 're'"_ I'm not sure what kind of answer you expect. Can you be more specific about what the issue is? — AMC, Feb 14 '20 at 23:33

mechanical_meat · Accepted Answer · 2020-02-14T23:38:44.780

5

Here's the fixed code:

sorted_output = sorted(filelist,key=lambda x: sorting.index(re.split(r'_|\.',x)[3]))

The string input to re.split() should be passed as the second argument to the function; you do not call re.split() on a string. The first argument is the regular expression itself which you had correct.

Also: you need to escape the . with a \ because the full-stop or period is a special character in regular expressions which matches everything.

Output:

In [13]: sorted(filelist,key=lambda x: sorting.index(re.split(r'_|\.',x)[3]))                       
Out[13]: 
['3006345_2234661_ENG_FRONT.jpg',
 '3006345_2234661_ENG_BACK.jpg',
 '3006345_2234661_ENG_LEFT.jpg',
 '3006345_2234661_ENG_RIGHT.jpg',
 '3006345_2234661_ENG_INGREDIENTS.jpg',
 '3006345_2234661_ENG_INSTRUCTIONS.jpg',
 '3006345_2234661_ENG_INFO.jpg',
 '3006345_2234661_ENG_NUTRITION.jpg',
 '3006345_2234661_ENG_PRODUCT.jpg']

Edit: as @Todd mentions in the comments, if you want to additionally ensure that the strings are sorted by the numeric part after the first sort takes place then use:

sorted(filelist,key=lambda x: [sorting.index(re.split(r'_|\.',x)[3]),x])

edited Feb 14 '20 at 23:38

answered Feb 14 '20 at 21:53

mechanical_meat

163,903
24
228
223

1

This looks good. Just want to add, that if you want to sort on the numeric prefix as a secondary key, use: [sorting.index(re.split(...)[3]), x] – Todd Feb 14 '20 at 22:35
@Todd: Thank you for your comment. I'm not sure I understand your code sample. Would you be able to elaborate a bit? – mechanical_meat Feb 14 '20 at 22:58
1

gladly: sorted(filelist,key=lambda x: [sorting.index(re.split(r'_|\.',x)[3]), x]) <-- I was proposing this. If the list of filenames had multiple items containing 'FRONT', this would sort them first by the indexes of *sorting* (primary key) and then alphanumerically by filename (secondary key). – Todd Feb 14 '20 at 23:33
Ah, nice. Got it. Updated answer to take this into account. Thanks again. – mechanical_meat Feb 14 '20 at 23:39

split on multiple characters in string

1 Answers1