0

I made a code that reads a series of CSV files within a folder. It reads all of the CSV files within a folder that I select. The CSV files are formatted like:

A1.CSV, A2.CSV, A3.CSV, ...., A10.CSV, A11.CSV, ..... A24.CSV, B1.CSV, B2.CSV, ...

and so forth.

The code reads the files and then appends them column-wise. However, when it reads the files, the order becomes: A1 > A10 > A11 > ... > A19 > A2 > A20 > A21 ...

Is there a way to make python read the files in alpha-numerical order like: A1 > A2 > A3 > ... > A10 > A11 >...

I can manually add zeroes to single digits (like A01), which is tedious as I have many files with single digits after an alphabet.

for files in os.listdir(full_path):
    if os.path.isfile(os.path.join(full_path, files)):
        df = pd.read_excel(files, 'Sheet1')[pkitems]
        df[''] = np.nan
        dfs.append(df)
data = pd.concat(dfs, axis=1)
Bong Kyo Seo
  • 381
  • 2
  • 7
  • 18
  • This seems to address your question: https://stackoverflow.com/questions/4813061/non-alphanumeric-list-order-from-os-listdir – Tacratis Feb 18 '19 at 02:36

2 Answers2

1

Use built-in list.sort with two keys:

file_list = [file for file in os.listdir(full_path) if file.endswith('.csv')]
file_list.sort(key=lambda x:(os.path.splitext(x)[0][0], int(os.path.splitext(x)[0][1:])))
print(file_list)
['A1.csv',
 'A2.csv',
 ...
 'A23.csv',
 'A24.csv',
 'B1.csv',
 'B2.csv',
 ...
 'B23.csv',
 'B24.csv']

Edit: In order to only read the files that end with '.csv'

Chris
  • 29,127
  • 3
  • 28
  • 51
  • I tried using sort but it gave me this error: ValueError: invalid literal for int() with base 10: 'k list' when I print(file_list). Is it because the integers after a character may not be integers? – Bong Kyo Seo Feb 18 '19 at 06:18
  • @BongKyoSeo I think there are files in your `full_path` that do not comply the format of `A0.csv` (i.e. one letter, some digits, and .csv). – Chris Feb 18 '19 at 06:23
  • Well, I do have some files with A00.csv. But double digits should be okay with [1:] part if I remember correctly. Alphabets may run from A to P. Integers run from 1 ~ 24. Besides these files, there are no other files within the folder that I select as full_path. – Bong Kyo Seo Feb 18 '19 at 06:25
  • @BongKyoSeo From the error message, `os.listdir` returned a file name that contains `k list`. Can you double check? I've also editted my answer to read files that end with '.csv' only. – Chris Feb 18 '19 at 06:27
  • oh I just realized that my code creates a folder with name pk list. I guess the error message printed k list from 'pk list'. – Bong Kyo Seo Feb 18 '19 at 06:31
  • I should have realized to use if file.endswith as you suggested. Now it works! thanks! – Bong Kyo Seo Feb 18 '19 at 06:33
  • @BongKyoSeo Glad to help :) – Chris Feb 18 '19 at 06:39
0

You can sort your list of files as built by os.listdir in the right order by ordering on the index number if you strip them of the 'A's and convert them to ints:

for files in os.listdir(full_path).sort(key=lambda f: int(f[1:])):
    ...

This assumes that all your file names start with a single character that can be ignored for the indes

mommermi
  • 982
  • 10
  • 18
  • This fails with many issues: `list.sort()` returns `None`. You are therefore iterating over `None`, which will raise `TypeError`. Furthermore, OP's files contain extensions which is not taken care here. Lastly, `sort` with one key will return `A1, B1, ...`, which is again not a desired output of OP's question – Chris Feb 18 '19 at 04:03