1

is there an equivalent on pyspark that allow me to do similar operation as in Pandas

pd.contact(df1, df2, Axis=1)

I have tried several methods so far none of them seems to work. the concatenation that it does is vertical, and I'm needing to concatenate multiple spark dataframes into 1 whole dataframe.

if I use union or unionAll it the dataframes get stacked vertically, as one single column which is not useful for my use case. I also have tried this example (did not work either):

from functools import reduce  
from pyspark.sql import DataFrame

def unionAll(*dfs):
     return reduce(DataFrame.unionAll, dfs) 

any help will be greatly appreciated.

  • Does this answer your question?https://stackoverflow.com/questions/49763009/stack-spark-dataframes-horizontally-equivalent-to-pandas-concat-or-r-cbind – 过过招 Feb 11 '22 at 01:55
  • Does this answer your question? [Stack Spark dataframes horizontally - equivalent to pandas concat or r cbind](https://stackoverflow.com/questions/49763009/stack-spark-dataframes-horizontally-equivalent-to-pandas-concat-or-r-cbind) – blackbishop Feb 11 '22 at 09:40
  • Thank you, I see now there isn't a simplified way to do it. as Pandas will handle that piece. besides converting between pandas and pyspark just crashes everything. again thank you both. the post was really helpful on this matter. – Wendy Velasquez Feb 11 '22 at 14:58
  • I did find a way to join multiple spark dataframes though. using crossJoin function. – Wendy Velasquez Feb 13 '22 at 21:19

1 Answers1

0

The best way I have found is to join the dataframes using a unique id, and org.apache.spark.sql.functions.monotonically_increasing_id() happens to do the job

The following code in scala (would be the same in pyspark):

Set(df1, df2, df3).map(_.withColumn("id", monotonically_increasing_id()))
                  .reduce((a,b) => a.join(b, "id"))

Gives the horizontally concatenated dataframes.

Brown nightingale
  • 1,118
  • 1
  • 11
  • 11
  • Thank you for doing this :)! I did take a different approach, but it will be good to have your response as a reference if I come across this type of problem again. – Wendy Velasquez Mar 25 '23 at 00:04