0

I have around 10 csv files in my folder. I want to create a 1:1 merge on one common column, called "ACCESS_ID".

After specifying the path directory, I create a DataFrame to put all the local csv files in:

os.chdir('C:/Users/xx/Downloads/merge/')
csvs_all = pd.DataFrame()

I can individually read in all the .csv files using pd.read_csv() function and store all the dataframes to a list, but I am trying to automate the process as much as possible.

for f in glob.glob('C:/Users/xx/Downloads/merge/*.csv'):
    df = pd.read_csv(f)
    csvs_all=pd.concat([dfs,df])

This code is not working and is creating multiple duplicate entries. Some of the column entries are exhibiting NA values. Is there a way to use pd.merge on multiple datasets/csv files? I would like to do the following:

csvs_all = pd.merge([all my csv files], on='ACCESS_ID')

JodeCharger100
  • 903
  • 1
  • 12
  • 25
  • 1
    also found this https://stackoverflow.com/questions/59272087/automating-pandas-dataframe-join-from-list duplicate maybe – ombk Nov 29 '20 at 01:40
  • 1
    [Never call `DataFrame.append` or `pd.concat` inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451). Instead, build a list of data frames and `concat` **once** outside the loop. – Parfait Nov 29 '20 at 04:09

1 Answers1

0

Normal concatenation

df = pd.read_csv("file1.csv",index_col=0)
for i in range(1,3):
    new_df = pd.read_csv(f"file{i}.csv",index_col=0)
    df = pd.concat([df,new_df],axis=1)

Works.

    names   filtered    names   filtered    names   filtered
0   01011RT0TU7 1011000 01011RT0TU7 1011000 02011RT0TU7 2011000
1   11041NT4TU8 11041000    11041NT4TU8 11041000    11041NT4TU8 11041000
2   51391RST0U2 51391000    51391RST0U2 51391000    51391RST0U2 51391000
3   01011645RT0TU9  1011645 01011645RT0TU9  1011645 01011645RT0TU9  1011645
4   11311455TX0TU8  11311455    11311455TX0TU8  11311455    11311455TX0TU8  11311455
5   51041545ST3TU9  51041545    51041545ST3TU9  51041545    51041545ST3TU9  51041545

I know it looks ugly, but it works.

ombk
  • 2,036
  • 1
  • 4
  • 16