Merge 4 dataframes into one

Question

I have 4 dataframes which only have one row and one column, and I would like to combine them into one dataframe. In python i would do this using the zip function but I need a way to do it in pyspark. Any suggestions?

Dataframes look like this:

+--------------------------+
|sum(sum(parcelUBLD_SQ_FT))|
+--------------------------+
|              1.13014806E8|
+--------------------------+

+---------------------+
|sum(parcelUBLD_SQ_FT)|
+---------------------+
|         1.13014806E8|
+---------------------+

+---------------+
|count(parcelID)|
+---------------+
|          45932|
+---------------+

+----------------+
|sum(parcelCount)|
+----------------+
|           45932|
+----------------+

and I would like it to look like this:

+--------------------------+---------------------+---------------+----------------+
|sum(sum(parcelUBLD_SQ_FT))|sum(parcelUBLD_SQ_FT)|count(parcelID)|sum(parcelCount)|
+--------------------------+---------------------+---------------+----------------+
|              1.13014806E8|         1.13014806E8|          45932|           45932|
+--------------------------+---------------------+---------------+----------------+

Possible duplicate of [Spark: Merge 2 dataframes by adding row index/number on both dataframes](https://stackoverflow.com/questions/40508489/spark-merge-2-dataframes-by-adding-row-index-number-on-both-dataframes) — Chris, Apr 29 '19 at 17:39

score 1 · Accepted Answer · answered Apr 29 '19 at 19:41

1

Since, you clearly specified all dataframes are having one row, you can use cross join to get the desired output

df1.crossJoin(df2).crossJoin(df3).crossJoin(df4)

answered Apr 29 '19 at 19:41

Ranga Vure

1,922
3
16
23

Merge 4 dataframes into one

1 Answers1