317

I have written the following Python code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os, glob

path = '/home/my/path'
for infile in glob.glob( os.path.join(path, '*.png') ):
    print infile

Now I get this:

/home/my/path/output0352.png
/home/my/path/output0005.png
/home/my/path/output0137.png
/home/my/path/output0202.png
/home/my/path/output0023.png
/home/my/path/output0048.png
/home/my/path/output0069.png
/home/my/path/output0246.png
/home/my/path/output0071.png
/home/my/path/output0402.png
/home/my/path/output0230.png
/home/my/path/output0182.png
/home/my/path/output0121.png
/home/my/path/output0104.png
/home/my/path/output0219.png
/home/my/path/output0226.png
/home/my/path/output0215.png
/home/my/path/output0266.png
/home/my/path/output0347.png
/home/my/path/output0295.png
/home/my/path/output0131.png
/home/my/path/output0208.png
/home/my/path/output0194.png

In which way is it ordered?

To clarify: I am not interested in ordering - I know sorted. I want to know in which order it comes by default.

It might help you to get my ls -l output:

-rw-r--r-- 1 moose moose 627669 2011-07-17 17:26 output0005.png
-rw-r--r-- 1 moose moose 596417 2011-07-17 17:26 output0023.png
-rw-r--r-- 1 moose moose 543639 2011-07-17 17:26 output0048.png
-rw-r--r-- 1 moose moose 535384 2011-07-17 17:27 output0069.png
-rw-r--r-- 1 moose moose 543216 2011-07-17 17:27 output0071.png
-rw-r--r-- 1 moose moose 561776 2011-07-17 17:27 output0104.png
-rw-r--r-- 1 moose moose 501865 2011-07-17 17:27 output0121.png
-rw-r--r-- 1 moose moose 547144 2011-07-17 17:27 output0131.png
-rw-r--r-- 1 moose moose 530596 2011-07-17 17:27 output0137.png
-rw-r--r-- 1 moose moose 532567 2011-07-17 17:27 output0182.png
-rw-r--r-- 1 moose moose 553562 2011-07-17 17:27 output0194.png
-rw-r--r-- 1 moose moose 574065 2011-07-17 17:27 output0202.png
-rw-r--r-- 1 moose moose 552197 2011-07-17 17:27 output0208.png
-rw-r--r-- 1 moose moose 559809 2011-07-17 17:27 output0215.png
-rw-r--r-- 1 moose moose 549046 2011-07-17 17:27 output0219.png
-rw-r--r-- 1 moose moose 566661 2011-07-17 17:27 output0226.png
-rw-r--r-- 1 moose moose 561678 2011-07-17 17:27 output0246.png
-rw-r--r-- 1 moose moose 525550 2011-07-17 17:27 output0266.png
-rw-r--r-- 1 moose moose 565715 2011-07-17 17:27 output0295.png
-rw-r--r-- 1 moose moose 568381 2011-07-17 17:28 output0347.png
-rw-r--r-- 1 moose moose 532768 2011-07-17 17:28 output0352.png
-rw-r--r-- 1 moose moose 535818 2011-07-17 17:28 output0402.png

It is not ordered by filename or size.

Other links: glob, ls

martineau
  • 119,623
  • 25
  • 170
  • 301
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 2
    The final answer seems to be that the `ls` command itself sorts files by name. 'ls -U' gives an unordered list of files in "directory order". – Brian Peterson Oct 01 '13 at 22:38
  • 5
    On windows it was sorted so I just assumed it's always so.. now on Ubuntu it cost me debugging. Note to self - read the api! :0) – Yuri Feldman Jan 17 '19 at 12:05
  • 1
    The behaviour is the same with `os.listdir`: *nix OS returns files in quite a non-alphabetical order, and (shame on me to be suprised!) [this is explicit in the documentation](https://docs.python.org/3/library/os.html?highlight=os%20listdir#os.listdir): "The list is in arbitrary order". – Joël Apr 26 '19 at 06:42

12 Answers12

601

Order is arbitrary, but you can sort them yourself

If you want sorted by name:

sorted(glob.glob('*.png'))

sorted by modification time:

import os
sorted(glob.glob('*.png'), key=os.path.getmtime)

sorted by size:

import os
sorted(glob.glob('*.png'), key=os.path.getsize)

etc.

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • 2
    I have files, where names are just integers, without extension, so I use: `files = glob.glob('teksty/*')`. Will be the order by nam granted? – andilabs Mar 13 '14 at 07:44
  • 5
    @mgalgs No, that was not the question I really meant to ask. What I wanted to know was answered by Xion. – Martin Thoma Dec 22 '15 at 16:33
  • And what about sorting it by creation date but according to creation time. Because it's listing me first the newests files. How can I get a list from old to newests files? Thank you! – joaquindev Feb 20 '18 at 18:04
  • 3
    Note that getmtime and getsize are relatively expensive - doing this for a lot of files may take a while.. – drevicko Apr 03 '18 at 15:35
  • Excelent!. It also works with `pathlib.Path` like `pathlib.Path('.').glob('*')` – imbr Aug 29 '22 at 18:21
155

It is probably not sorted at all and uses the order at which entries appear in the filesystem, i.e. the one you get when using ls -U. (At least on my machine this produces the same order as listing glob matches).

Xion
  • 22,400
  • 10
  • 55
  • 79
  • 5
    Yes, unless it does a special effort, it will simply show the entries as the operating system provides it. The same as the command "find" in Unix, it just dumps the entries in the order they come from the data structure used by the underlying filesystem. You should not make any assumptions about its ordering, even if you would see that files seem to appear in creation order. – Raúl Salinas-Monteagudo Mar 28 '19 at 14:53
76

By checking the source code of glob.glob you see that it internally calls os.listdir, described here:

http://docs.python.org/library/os.html?highlight=os.listdir#os.listdir

Key sentence:

os.listdir(path='.')
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

Arbitrary order.

β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83
Ray Toal
  • 86,166
  • 18
  • 182
  • 232
25

Order is arbitrary, but there are several ways to sort them. One of them is as following:

#First, get the files:
import glob
import re
files =glob.glob1(img_folder,'*'+output_image_format)
# if you want sort files according to the digits included in the filename, you can do as following:
files = sorted(files, key=lambda x:float(re.findall("(\d+)",x)[0]))
Zoe
  • 27,060
  • 21
  • 118
  • 148
April
  • 273
  • 3
  • 6
  • What does your answer contribute in comparison to the existing answers? – Martin Thoma Dec 09 '18 at 18:59
  • 2
    @MartinThoma I have an issue with sorted not sorting the filenames if the integers present in the files are not zero padded. The sorting starts at 1000, goes up to whatever the highest integer is and then starts back over from the smallest integer. If I zero pad the numbers, just calling sorted on the files sorts them perfectly. So I think this solution solves the problem when sorted alone doesn't work. – Will.Evo Sep 27 '19 at 20:01
  • 7
    @Will.Evo Try using [`natsort`](https://pypi.org/project/natsort/): `from natsort import natsorted; files = natsorted(files)`. – Martin Thoma Sep 27 '19 at 20:45
  • Your answer helped ! – Vineet Jan 19 '20 at 19:10
18

I had a similar issue, glob was returning a list of file names in an arbitrary order but I wanted to step through them in numerical order as indicated by the file name. This is how I achieved it:

My files were returned by glob something like:

myList = ["c:\tmp\x\123.csv", "c:\tmp\x\44.csv", "c:\tmp\x\101.csv", "c:\tmp\x\102.csv", "c:\tmp\x\12.csv"]

I sorted the list in place, to do this I created a function:

def sortKeyFunc(s):
    return int(os.path.basename(s)[:-4])

This function returns the numeric part of the file name and converts to an integer.I then called the sort method on the list as such:

myList.sort(key=sortKeyFunc)

This returned a list as such:

["c:\tmp\x\12.csv", "c:\tmp\x\44.csv", "c:\tmp\x\101.csv", "c:\tmp\x\102.csv", "c:\tmp\x\123.csv"]
Hornbydd
  • 321
  • 2
  • 12
  • 2
    I think it is more elegant to use `os.path.splitext(os.path.basename(s))[0]` instead of `os.path.basename(s)[:-4]`, so the function definition will be. `def sortKeyFunc(s): return int(os.path.splitext(os.path.basename(s))[0])` – ePandit May 06 '20 at 11:07
16

glob.glob() is a wrapper around os.listdir() so the underlaying OS is in charge for delivering the data. In general: you can not make an assumption on the ordering here. The basic assumption is: no ordering. If you need some sorting: sort on the application level.

9

From @Johan La Rooy's solution, sorting the images using sorted(glob.glob('*.png')) does not work for me, the output list is still not ordered by their names.

However, the sorted(glob.glob('*.png'), key=os.path.getmtime) works perfectly.

I am a bit confused how can sorting by their names does not work here.

Thank @Martin Thoma for posting this great question and @Johan La Rooy for the helpful solutions.

Haoyu Wang
  • 91
  • 1
  • 1
  • 2
    I think the alphabetical order is better for this solution, because the file dates may change differently on Linux. So if the order number is used in the file name (with the same number of digits), it is also possible to use alphabetical order: `sorted(glob.glob('*.png'), key=os.path.basename)` – s3n0 Mar 01 '21 at 08:56
  • For some reason key=os.path.basename does not work for me, but os.path.getmtime does. I find this peculiar myself, but just wanted to share what I faced. – Suprateem Banerjee Jan 05 '22 at 19:07
8

At least in Python3 you also can do this:

import os, re, glob

path = '/home/my/path'
files = glob.glob(os.path.join(path, '*.png'))
files.sort(key=lambda x:[int(c) if c.isdigit() else c for c in re.split(r'(\d+)', x)])
for infile in files:
    print(infile)

This should lexicographically order your input array of strings (e.g. respect numbers in strings while ordering).

Breit
  • 399
  • 5
  • 9
5

I used the built in sorted so solve this problem:

from pathlib import Path

p = Path('/home/my/path')
sorted(list(p.glob('**/*.png')))
Hugo
  • 135
  • 2
  • 9
2

If you're wondering about what glob.glob has done on your system in the past and cannot add a sorted call, the ordering will be consistent on Mac HFS+ filesystems and will be traversal order on other Unix systems. So it will likely have been deterministic unless the underlying filesystem was reorganized which can happen if files were added, removed, renamed, deleted, moved, etc...

crizCraig
  • 8,487
  • 6
  • 54
  • 53
-1

Please try this code:

sorted(glob.glob( os.path.join(path, '*.png') ),key=lambda x:float(re.findall("([0-9]+?)\.png",x)[0]))
faris
  • 19
  • 2
-3
'''my file name is 
"0_male_0.wav", "0_male_2.wav"... "0_male_30.wav"... 
"1_male_0.wav", "1_male_2.wav"... "1_male_30.wav"... 
"8_male_0.wav", "8_male_2.wav"... "8_male_30.wav"

when I wav.read(files) I want to read them in a sorted torder, i.e., "0_male_0.wav"
"0_male_1.wav"
"0_male_2.wav" ...
"0_male_30.wav"
"1_male_0.wav"
"1_male_1.wav"
"1_male_2.wav" ...
"1_male_30.wav"
so this is how I did it.

Just take all files start with "0_*" as an example. Others you can just put it in a loop
'''

import scipy.io.wavfile as wav
import glob 
from os.path import isfile, join

#get all the file names in file_names. THe order is totally messed up
file_names = [f for f in listdir(audio_folder_dir) if isfile(join(audio_folder_dir, f)) and '.wav' in f] 
#find files that belongs to "0_*" group
filegroup0 = glob.glob(audio_folder_dir+'/0_*')
#now you get sorted files in group '0_*' by the last number in the filename
filegroup0 = sorted(filegroup0, key=getKey)

def getKey(filename):
    file_text_name = os.path.splitext(os.path.basename(filename))  #you get the file's text name without extension
    file_last_num = os.path.basename(file_text_name[0]).split('_')  #you get three elements, the last one is the number. You want to sort it by this number
    return int(file_last_num[2])

That's how I did my particular case. Hope it's helpful.

  • 1
    You should change your answer to fit the question. – CodenameLambda Dec 22 '16 at 15:51
  • 1
    The question is not about sorting. I know (and I knew back then) how to sort. The question is about the default order. – Martin Thoma Dec 22 '16 at 16:07
  • 1
    Thank you for this code snippet, which may provide some immediate help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its educational value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with similar, but not identical, questions. Please [edit] your answer to add explanation, and give an indication of what limitations and assumptions apply. – Toby Speight Jun 21 '17 at 11:02