0

I have a folder located at C:\Users\Documents\folder and inside that folder there are 500 randomly named subfolders. Each subfolders has multiple csv files. I want to import csv files if only their name contain word client from those subfolders and concatenate the imported into one dataframe (lets hope I wont have any RAM issue).

Can someone help? Many Thanks.

FlyUFalcon
  • 314
  • 1
  • 4
  • 18

1 Answers1

2

I think this should do it:

import os
import pandas as pd

source_dir = r'C:\Users\Documents\folder'

my_list = []

for root, dirnames, filenames in os.walk(source_dir):
    for f in filenames:

        if 'client' in f:

            my_list.append(pd.read_csv(os.path.join(root, f)))

concatted_df = pd.concat(my_list)
Matthew Borish
  • 3,016
  • 2
  • 13
  • 25
  • Thanks for this. I have tried your code and I got ValueError: No Objects to concatenate.... – FlyUFalcon Apr 17 '20 at 20:55
  • probably you need to specify .csv format somewhere(?) – FlyUFalcon Apr 17 '20 at 21:05
  • This solution works if the filenames contain the word client. The .csv extension is deduced from the if 'client' in f line. Do the files contain the word client, or the folders? Are there other .csvs in the folders that should not be included? – Matthew Borish Apr 17 '20 at 21:12
  • THX. each subfolder contains 3 or 4 csv files. Only 1 or 2 of them contain word 'client'. So some csvs should be excluded. I run the code again and I have UnicodeDecodeError: 'utf-8' codec can't be decode byte 0xbc in position 2: invalid startbyte. – FlyUFalcon Apr 17 '20 at 21:24
  • THX. I dont know why, but I need to add more specific text to make this code run. The new text need to be like '_client_2020_'. – FlyUFalcon Apr 17 '20 at 21:42
  • 1
    You can alter the 'client' text string as needed. As for your encoding error, maybe this will help. https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python. – Matthew Borish Apr 17 '20 at 21:54