0

I've created a list of csv files and cleaned them. I've been stuck on merging these lists of csv files together. Each csv file, after cleaning, have the same column labels. They also have an extra column labels. I need to merge the columns with the same name.

Here is an example of my code:

os.listdir(os.getcwd()) 
filelist = glob.glob('*.csv') 

for file in filelist:
    df = pd.read_csv(file)
    #cleaning code section
    print(df.head())
pd.concat(filelist)

I've tried to use pd.concat(filelist) because I though it can do that with lists but I get this

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

If that's the case, can I make my list into a DataFrame object or can I use something like merge or join?

Please send help!

  • 2
    `filelist` is a list of file paths... You'd have to keep a list of DataFrames. `dfs = []` before the for loop, then `dfs.append(df)` at the end of the loop body. Lastly, `pd.concat(dfs)` – Henry Ecker Jul 15 '21 at 03:41
  • Does this answer your question? [Import multiple csv files into pandas and concatenate into one DataFrame](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) – Henry Ecker Jul 15 '21 at 03:41
  • Thanks for taking the time to help answer my question @Henry – Melissa Bowman Jul 15 '21 at 15:07
  • Lolz. I'm still getting used to commenting. Also I can't see the rest of your comment about the pd.concat(dfs). Can you restate that info? – Melissa Bowman Jul 15 '21 at 15:09
  • 1
    The linked duplicate has a complete code example, which is near identical to the answer below. And also expands on what I was saying in my comment. – Henry Ecker Jul 15 '21 at 15:20

1 Answers1

0
os.listdir(os.getcwd()) 
filelist = glob.glob('*.csv') 

dfs = []
for file in filelist:
    file_df = pd.read_csv(file)

    #cleaning code section
    print(file_df.head())

    dfs.append(file_df)

df = pd.concat(dfs, ignore_index=True) # ignore_index to reset index in concatenated df

# One-liner (no cleaning)
df = pd.concat((pd.read.csv(file) for file in glob.glob('/*.csv')), ignore_index = True)
filiabel
  • 395
  • 1
  • 8
  • 1
    I was doing dfs =[] before my loop and appending the file but when I tried to combine, I did it with just pd.concat(dfs). Why was it important to add the ignore_index = True. It totally work btw. I just now want to know what I was doing wrong? – Melissa Bowman Jul 15 '21 at 15:14
  • Also thank you for the help! I really appreciate it! ^_^ – Melissa Bowman Jul 15 '21 at 15:16
  • 1
    Wait (palm to face) I didn't set a variable for the pd.concat (list). -_- I got it. Thank you for guiding me in the right direction. – Melissa Bowman Jul 15 '21 at 23:22
  • @MelissaBowman Yes, you didn't save the csv files which you read into dataframes. Glad I could help, would appreciate if you accepted the answer :-) – filiabel Jul 16 '21 at 08:41
  • 1
    Thanks again! I just figured out how to accept you answer. ^_^ – Melissa Bowman Jul 21 '21 at 19:17