0

I want to combine these 3 dataframes, based on their ID columns, and get the below output. I am after a short way that I can use it for combining many more number of dataframes later.

Input:

+---+---+---+
| ID|  a|  b|
+---+---+---+
|  A|  1|  1|
|  B|  2|  2|
+---+---+---+
+---+---+---+
| ID|  c|  d|
+---+---+---+
|  A|  3|333|
|  B|  4|444|
+---+---+---+
+---+---+---+
| ID|  e|  f|
+---+---+---+
|  A|555|  5|
|  B|666|  6|
+---+---+---+

Output:

+---+---+---+---+---+---+---+
| ID|  a|  b|  c|  d|  e|  f|
+---+---+---+---+---+---+---+
|  A|  1|  1|  3|333|555|  5|
|  B|  2|  2|  4|444|666|  6|
+---+---+---+---+---+---+---+

Answer: For anyone who might find it useful later!

# create list of dataframes
list_df = [df1, df2, df3]

# merge all at once
df_all = reduce(lambda x, y: x.join(y, on="ID"), list_df)
Zrb
  • 1
  • 1
  • This question may have already been answered here (unfortunately it looks like it's not possible, or difficult): https://stackoverflow.com/questions/49763009/stack-spark-dataframes-horizontally-equivalent-to-pandas-concat-or-r-cbind – Stephen Oct 15 '22 at 03:00
  • Did I miss something? Is that just df1 inner join df2 inner join df3 based on id column? – Jonathan Lam Oct 15 '22 at 03:14
  • @JonathanLam it is what you said, but I am after a shorter way that I don't have to bring all one after another. So, that I can use it later for more number of dataframes. – Zrb Oct 15 '22 at 06:13

1 Answers1

0

The 3 dataframes can be jioned according to the ID column.

df = df1.join(df2, 'ID').join(df3, 'ID')
过过招
  • 3,722
  • 2
  • 4
  • 11