sort lines by two digits

Question

Hello I have a text file that looks like this:

file '4. Can This Be Love.mp3' 
file '3. I Wanna Share It With You.mp3' 
file '8. Hold On.mp3' 
file '6. Playing With Fire.mp3' 
file '1. Take Me To The River.mp3' 
file '5. Giving It Up For You.mp3' 
file '10. Hooked On You.mp3' 
file '9. I Can'\''t Stop.mp3' 
file '7. Make It Together.mp3' 
file '2. Can'\''t Stop Myself.mp3'

I am trying to sort the file by the beginning string numbers so its in correct order from 1 to 10. My code looks like this:

#open file to read
shopping = open(songInputsFilepath, "r")
#get lines from file
lines = shopping.readlines()
#sort lines
lines.sort()
#close file
shopping.close()

#open file for writing
shoppingwrite = open(songInputsFilepath, "w")
#write sorted lines to file
shoppingwrite.writelines(lines)
#close file for writing
shoppingwrite.close()

It almost works, but misplaces the 10th track, and results in a file sorted like so:

file '1. Take Me To The River.mp3' 
file '10. Hooked On You.mp3' 
file '2. Can'\''t Stop Myself.mp3' 
file '3. I Wanna Share It With You.mp3' 
file '4. Can This Be Love.mp3' 
file '5. Giving It Up For You.mp3' 
file '6. Playing With Fire.mp3' 
file '7. Make It Together.mp3' 
file '8. Hold On.mp3' 
file '9. I Can'\''t Stop.mp3'

Is there some way to tell the lines.sort() function to use a regex expression to only sort each line based on the string which comes before the first 'period' character?

the answer describes a 3rd party library solution, ill consider it but id like to not add a new library if needed — Martin, Jul 03 '20 at 21:38
After reading Greg's answer I realized natural sort is overkill since you only need to sort based on one number. I voted to reopen the question. — wjandrea, Jul 03 '20 at 21:48

Greg Schmit · Accepted Answer · 2020-07-03T21:55:37.140

3

If you don't mind using regular expressions, this would work:

Add import re at the top of your file.

Change this:

lines.sort()

to this:

lines.sort(key=lambda x: int(re.findall('\d+', x)[0]))

Or using search (which should be faster), as @wjandrea suggested:

lines.sort(key=lambda x: int(re.search('\d+', x).group()))

You can also accomplish the same thing by setting the key to slice off the first characters which are numeric, though it might be slightly more verbose.

edited Jul 03 '20 at 21:55

answered Jul 03 '20 at 21:44

Greg Schmit

4,275
2
21
36

Why `re.findall('\d+', x)[0]` instead of `re.search('\d+', x)`? I haven't tested either of them, but it seems shorter. – wjandrea Jul 03 '20 at 21:45
Ah, search, that's much smarter. No reason not to use that! I expect it is faster, though I'm not sure if his dataset will grow much given that it appears to be songs written to disk :P – Greg Schmit Jul 03 '20 at 21:47
chage to this lines.sort(key=lambda x: int(re.search('\d+', x).group(1))) – Олег Гребчук Jul 03 '20 at 21:48
@ОлегГребчук `group(1)` will error out; `group()` is all that is needed. He could add capturing parenthesis but not really needed. – Greg Schmit Jul 03 '20 at 21:57

sort lines by two digits

1 Answers1