4

I have 3 different dataframes that I want to join, using label and window as keys.

DataFrame1

Window  Label  FeatA
123      1        h
123      2        f

DataFrame2

Window  Label  FeatB
123      1      d 
123      2      s

DataFrame3

Window  Label  FeatC
123     1       d
123     2       c

Result

Window  Label  FeatA  FeatB  FeatC
123      1       h      d       d
123      2       f      s       c

I know how to join dataframes using pandas.concat but don't know how to specify keys. Any help would be greatly appreciated.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
gbhrea
  • 494
  • 1
  • 9
  • 22

3 Answers3

3

You need to use the merge function for joining tables, for your case, since you have multiple data frames to join, you can put them into a list and then use the reduce from functools to merge them one by one:

import pandas as pd
from functools import reduce
reduce(lambda x, y: pd.merge(x, y, on = ['Window', 'Label']), [df1, df2, df3])

#  Window   Label   FeatA   FeatB   FeatC
# 0   123       1       h      d        d
# 1   123       2       f      s        c
Psidom
  • 209,562
  • 33
  • 339
  • 356
3

A pure pandas answer using pd.concat

pd.concat([df.set_index(['Window', 'Label']) for df in [df1_, df2_, df3_]],
          axis=1).reset_index()

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

you can use combine_first:

In[44]:df.combine_first(df1).combine_first(df2)[['Window','Label','FeatA','FeatB','FeatC']]
Out[44]: 
   Window  Label FeatA FeatB FeatC
0     123      1     h     d     d
1     123      2     f     s     c

or you can use merge:

In[30]:df.merge(df1,on=['Window','Label']).merge(df2,on=['Window','Label'])
Out[30]: 
   Window  Label FeatA FeatB FeatC
0     123      1     h     d     d
1     123      2     f     s     c
shivsn
  • 7,680
  • 1
  • 26
  • 33
  • 1
    `combine_first` is matching on `index` and `columns`. It will produce a dataframe that has an index that is `a.index.union(b.index)` and like wise for the columns. if you have `a = pd.DataFrame([], list('ab'), list('AB'))` and `b = pd.DataFrame([], list('cd'), list('CD'))` then `a.combine_first(b)` will be a 4x4 dataframe. The point is, your answer is ignoring the fact that the OP wants to join on `['Window', 'Label']` and is actually matching on the given integer index. – piRSquared Jul 24 '16 at 02:44
  • @piRSquared thanks for pointing I have lot to learn. – shivsn Jul 24 '16 at 04:48