1

I am searching for all .csv's located in a subfolder with glob like so:

def scan_for_files(path):
    file_list = []
    for path, dirs, files in os.walk(path):
        for d in dirs:
            for f in glob.iglob(os.path.join(path, d, '*.csv')):
                file_list.append(f)
    return file_list

If I call:

path = r'/data/realtimedata/trades/bitfinex/' scan_for_files(path)

I get the correct recursive list of files:

['/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_12.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_13.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_15.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_11.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_09.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_10.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_08.csv',
 '/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_14.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_14.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_12.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_10.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_08.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_09.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_15.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_11.csv',
 '/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_13.csv']

However when using the actual sub-directory containing the files I want - it returns an empty list. Any idea why this is happening? Thanks.

path = r'/data/realtimedata/trades/bitfinex/btcusd/' scan_for_files(path) returns: []

swifty
  • 1,182
  • 1
  • 15
  • 36
  • use https://docs.python.org/3/library/os.html#os.walk – Rachit kapadia May 15 '18 at 06:17
  • 1
    If you're on Python 3.5+, one liner like so glob.glob(path, recursive=True) would do the trick. For more complete answer and older Python versions, see here: https://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python – Lukasz Tracewski May 15 '18 at 06:20
  • @LukaszTracewski Thanks, forgot that existed. Nice suggestion. – cs95 May 15 '18 at 06:25

1 Answers1

5

Looks like btcusd is a bottom-level directory. That means that when you call os.walk with the r'/data/realtimedata/trades/bitfinex/btcusd/' path, the dirs variable will be an empty list [], so the inner loop for d in dirs: does not execute at all.

My advice would be to re-write your function to iterate over the files directly, and not the directories... don't worry, you'll get there eventually, that's the nature of a directory tree.

def scan_for_files(path):
    file_list = []
    for path, _, files in os.walk(path):
        for f in files:
            file_list.extend(glob.iglob(os.path.join(path, f, '*.csv'))

    return file_list

However, on more recent versions of python (3.5+), you can use recursive glob:

def scan_for_files(path):
    return glob.glob(os.path.join(path, '**', '*.csv'), recursive=True)

Source.

cs95
  • 379,657
  • 97
  • 704
  • 746