1

I have 3 pandas dataframes with a structure similar to:

pandas1:
date    star    col1    col2    
2019-01-30T00:32:18.128 tau_Cet 12  25
2019-01-30T00:34:05.525 tau_Cet 23  466
2019-01-03T03:54:59.886 HD_41248    344 997
2019-01-06T03:54:25.886 51_Peg  353 458


pandas2:
date    star    col3    col4    
2019-01-30T00:32:18.128 tau_Cet 1   2
2019-01-30T00:34:05.525 tau_Cet 2   4
2019-01-03T03:54:59.886 HD_41248    34  99
2019-01-06T03:54:25.886 51_Peg  33  45


pandas3:
date    star    col5    col 4   
2019-01-30T00:32:18.128 tau_Cet 4   2
2019-01-30T00:34:05.525 tau_Cet 5   4
2019-01-03T03:54:59.886 HD_41248    3   99
2019-01-06T03:54:25.886 51_Peg  333 45

How can I merge them all on the date field?

Some properties:

  • the data to construct each data frames is read from individual tab separated files

  • the date field will be unique to each line on each pandas, an d will be the linking key between all parameters;

  • the data in columns col1-5 might be from different types (string, float, date, etc)

  • columns col1-5 might or not have the same title. If the title is the same, the table value will also be the same.

the expected final result:

pandasMerged
date    star    col1    col2    col3    col4    col5
2019-01-30T00:32:18.128 tau_Cet     12  25  1   2   4
2019-01-30T00:34:05.525 tau_Cet     23  466 2   4   5
2019-01-03T03:54:59.886 HD_41248    344 997 34  99  3
2019-01-06T03:54:25.886 51_Peg      353 458 33  45  333

Any ideas/suggestions?

Thanks Jorge

jorgehumberto
  • 1,047
  • 5
  • 15
  • 33
  • 1
    Looking for [this](https://stackoverflow.com/questions/44327999/python-pandas-merge-multiple-dataframes)? – Erfan Feb 24 '20 at 15:00

1 Answers1

1

Use concat in list comprehension with DataFrame.set_index in first step, then remove duplicated columns by DataFrame.loc and inverted mask by Index.duplicated and last convert MultiIndex to columns by DataFrame.reset_index:

dfs = [pandas1,pandas2,pandas3]

df = pd.concat([x.set_index(['date','star']) for x in dfs], axis=1)
df = df.loc[:, ~df.columns.duplicated()].reset_index()
print (df)
                      date      star  col1  col2  col3  col4  col5
0  2019-01-30T00:32:18.128   tau_Cet    12    25     1     2     4
1  2019-01-30T00:34:05.525   tau_Cet    23   466     2     4     5
2  2019-01-03T03:54:59.886  HD_41248   344   997    34    99     3
3  2019-01-06T03:54:25.886    51_Peg   353   458    33    45   333
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252