6

I have a number of dataframes (100) in a list as:

frameList = [df1,df2,..,df100]

Each dataframe has the two columns DateTime, Temperature.

I want to intersect all the dataframes on the common DateTime column and get all their Temperature columns combined/merged into one big dataframe: Temperature from df1, Temperature from df2, Temperature from df3, .., Temperature from df100.

(pandas merge doesn't work as I'd have to compute multiple (99) pairwise intersections).

smci
  • 32,567
  • 20
  • 113
  • 146
Zanam
  • 4,607
  • 13
  • 67
  • 143

3 Answers3

12

Use pd.concat, which works on a list of DataFrames or Series.

pd.concat(frameList, axis=1, join='inner')

This is better than using pd.merge, as pd.merge will copy the data pairwise every time it is executed. pd.concat copies only once. However, pd.concat only merges based on an axes, whereas pd.merge can also merge on (multiple) columns.

Olaf
  • 415
  • 3
  • 12
  • 1
    It keeps multiplie "DateTime" columns after concat. Is there a way to keep only 1 "DateTime". – Zanam Nov 10 '16 at 17:51
  • 1
    yes, make the DateTime the index, for each dataframe: `[df.set_index('DateTime', inplace=True) for df in frameList]`. – Olaf Nov 10 '16 at 19:36
4

you can try using reduce functionality in python..something like this

dfs = [df0, df1, df2, dfN]
df_final = reduce(lambda left,right: pd.merge(left,right,on='DateTime'), dfs)
Vivek Srinivasan
  • 2,687
  • 3
  • 17
  • 17
  • Can you please explain how this works through reduce? – Zanam Nov 10 '16 at 17:24
  • inner function lambda just merges two dataframe at a given time.....`reduce` takes a function and a `list` to work on....it takes two element from the list at a time works on it....the following explains in details http://www.python-course.eu/lambda.php.....hope you accept and upvote my answer.... – Vivek Srinivasan Nov 10 '16 at 17:29
  • I think my question was not clear. So, I am getting all the temperature columns merged into one column. I still want to keep them separate as I explained in the edit to my question. – Zanam Nov 10 '16 at 17:40
  • this will keep temperature column from each dataframe ...the result will be like this "DateTime" | Temperatue_1 | Temperature_2 ....| Temperature_n.....is that wat you wanted – Vivek Srinivasan Nov 10 '16 at 17:43
2

You could iterate over your list like this:

df_merge = frameList[0]
for df in frameList[1:]:       
    df_merge = pd.merge(df_merge, df, on='DateTime', how='inner')
Alex
  • 12,078
  • 6
  • 64
  • 74