I have a huge dataframe containing millions of rows. From these rows I derive new k
dataframes which have only 1 row and 1 column.
What's a good way to concatenate these k dataframes together so as to now get a a dataframe 1 x k
that has 1 row and k columns.
In the past I started with using a crossJoin among all the
k
dataframes, such asdf1.crossJoin(df2).crossJoin(df3).crossJoin(dfk)
This resulted in a broadcast timeout error,
Later I moved to what I thought is a smarter solutions.
df1.withColumn("temp_id", lit(0)).join(df2.withColumn("temp_id", lit(0)), "temp_id").drop("temp_id")
.This resulted in a weirder yet similar error of broadcast timeout.
The result that I really want is a new DataFrame with 1 row and k columns which in numpy/pandas language could be
pandas.concat(..., axis=1)
OR
np.vstack(.....)