Let's say I have the following 2 tables that I wanna union:
+---------+-------+---+---+---+
|month_key|account|ch1|ch2|ch5|
+---------+-------+---+---+---+
| Aug| abc| 0| 1| 1|
+---------+-------+---+---+---+
+---------+-------+---+---+---+
|month_key|account|ch3|ch4|ch5|
+---------+-------+---+---+---+
| Sept| bcd| 1| 1| 0|
+---------+-------+---+---+---+
How can I keep all the columns when doing the union? I'm getting the following results when I do:
df1.union(df2).show()
+---------+-------+---+---+---+
|month_key|account|ch1|ch2|ch5|
+---------+-------+---+---+---+
| Aug| abc| 0| 1| 1|
| Sept| bcd| 1| 1| 0|
+---------+-------+---+---+---+
In my actual dataset, I'll be having hundreds of channels and I think it's difficult to use join if I need to list all of the common columns.
I'm using spark version 2.4.3.
I know that unionByName can be used for spark 3.0 and above but is there an alternative?
Thanks for your help.