-1

I have two dataframes:

df1 which consists of column from col1 to col7

df2 which consists of column from col1 to col9

I need to perform union of these two dataframes, however it fails because of the two extra columns.

Any idea what other function can be used?

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Eden T
  • 57
  • 1
  • 8
  • 1
    Does this answer your question? [How to perform union on two DataFrames with different amounts of columns in spark?](https://stackoverflow.com/questions/39758045/how-to-perform-union-on-two-dataframes-with-different-amounts-of-columns-in-spar) – Ram Ghadiyaram May 26 '20 at 21:43
  • 1
    clear duplicate please consider closing the question.. This is already explained in the answer of the link - what's wrong with it? You should really try to understand the code there – user3190018 May 26 '20 at 21:47

1 Answers1

-1

Add two columns to df2 and then go ahead with the union.

Import -

from pyspark.sql.functions import lit

If col8 and col9 are numbers then do -

new_df = df2.withColumn("col8", lit(float('nan'))).withColumn("col9", lit(float('nan')))

Or if col8 and col9 are strings then do -

new_df = df2.withColumn("col8", lit("")).withColumn("col9", lit(""))

Now union the new_df with df1.

Ani Menon
  • 27,209
  • 16
  • 105
  • 126