0

I have 4 dataframes which only have one row and one column, and I would like to combine them into one dataframe. In python i would do this using the zip function but I need a way to do it in pyspark. Any suggestions?

Dataframes look like this:

+--------------------------+
|sum(sum(parcelUBLD_SQ_FT))|
+--------------------------+
|              1.13014806E8|
+--------------------------+

+---------------------+
|sum(parcelUBLD_SQ_FT)|
+---------------------+
|         1.13014806E8|
+---------------------+

+---------------+
|count(parcelID)|
+---------------+
|          45932|
+---------------+

+----------------+
|sum(parcelCount)|
+----------------+
|           45932|
+----------------+

and I would like it to look like this:

+--------------------------+---------------------+---------------+----------------+
|sum(sum(parcelUBLD_SQ_FT))|sum(parcelUBLD_SQ_FT)|count(parcelID)|sum(parcelCount)|
+--------------------------+---------------------+---------------+----------------+
|              1.13014806E8|         1.13014806E8|          45932|           45932|
+--------------------------+---------------------+---------------+----------------+
DBA108642
  • 1,995
  • 1
  • 18
  • 55
  • Possible duplicate of [Spark: Merge 2 dataframes by adding row index/number on both dataframes](https://stackoverflow.com/questions/40508489/spark-merge-2-dataframes-by-adding-row-index-number-on-both-dataframes) – Chris Apr 29 '19 at 17:39
  • Your dataframes have just one value each ? – eliasah Apr 30 '19 at 06:12

1 Answers1

1

Since, you clearly specified all dataframes are having one row, you can use cross join to get the desired output

df1.crossJoin(df2).crossJoin(df3).crossJoin(df4)
Ranga Vure
  • 1,922
  • 3
  • 16
  • 23