0

I have a folder. In this folder are files that look sort of like this:

'page_121.png', 'page_122_png', 'page_123.png'... etc

When running this bit of code in Python:

sorted(os.listdir())

Something like this is returned:

'page_276.png', 'page_277.png', 'page_278.png', 'page_279.png', 'page_28.png', 'page_280.png', 'page_281.png', 'page_282.png'

Notice how it sorts them alphanumerically, but going from 279, it jumps to 28, which makes sense, but is not my desired result. How might I get it so that the array returned is in the order of the numbers smallest to largest, so that page_28.png is over in between page_27.png and page_29.png?

  • 2
    Are you aware of how to extract a number from a string? If you can do that, pass a function that does it as the `key` argument to `sorted`. – ShadowRanger Dec 16 '21 at 03:30
  • 1
    You can use [natsort](https://pypi.org/project/natsort/) – deadshot Dec 16 '21 at 03:39
  • 1
    To the couple below, I'll add `sorted(list, key=lambda s: int(re.sub(r'\D', '', s)))` This just keys on any digits in the filename, converted to an integer. – Gene Dec 16 '21 at 03:43
  • All these answers are working great. As it turns out, it had less to do with file sorting and more to do with natural sorting or human sorting in Python. I've associated the question with with one that already covers that. – Daniel Jungwirth Dec 16 '21 at 03:57

2 Answers2

5

You're sorting strings which, by default, is done in alphabetical order. Therefore, "279" is going to come before "28". What you want to do is sort the strings by the integer contained within the string. To accomplish this, you can do the following.

def extract_integer(filename):
    return int(filename.split('.')[0].split('_')[1])

sorted(os.listdir(), key=extract_integer)
Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
3

You may sort on the number, cast to an integer, using a lambda:

files = ['page_276.png', 'page_277.png', 'page_278.png', 'page_279.png', 'page_28.png', 'page_280.png', 'page_281.png', 'page_282.png']
files.sort(key=lambda x: int(re.sub(r'page_(\d+)\.png', r'\1', x)))
print(files)

This prints:

['page_28.png', 'page_276.png', 'page_277.png', 'page_278.png',
 'page_279.png', 'page_280.png', 'page_281.png', 'page_282.png']
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360