1

I'm writing a script that iterates over folders and then scans each folders file content and appends the data to a central CSV. I'm just wondering if its better for each file to open and append the data or to open the CSV file and run the loop inside, eg.

for dir_name in os.listdir('Some/Folder/Name'):
    df = folderstats.folderstats(f'Some/Folder/Name/{dir_name}', ignore_hidden=True)
    with open('exported_data.csv', 'a') as f:
        df.to_csv(f, header=False)

or (i don't know if this works, or if the theory is right but code is wrong)

with open('exported_data.csv', 'a') as f:
    for dir_name in os.listdir('Some/Folder/Name'):
        df = folderstats.folderstats(f'Some/Folder/Name/{dir_name}', ignore_hidden=True)
        df.to_csv(f, header=False)

Which is correct? I have about 100,000+ files to get data from and append.

Ari
  • 5,301
  • 8
  • 46
  • 120
  • The first proposed solution will open and then close your 'exported_data.csv' file every time it goes round the for loop which makes it wasteful and slower. The second proposed solution doesn't have this problem. – xxx Mar 12 '19 at 01:48

0 Answers0