0

I have two dataframes that come from doing a crosstab, and I want to divide one by the other in pyspark. The idea is that I have two frequency dataframes and I want to get a percentage. How can this be done?

from pyspark.sql.functions import col, struct
a = joined_df.crosstab("B", "C")
b = joined_df.withColumn("AB", struct("A", "B")).crosstab("AB", "C")

print("Check that cross tabs have same number of rows: ", a.count(), b.count())

I wanna do

c = a/b
pault
  • 41,343
  • 15
  • 107
  • 149
python_enthusiast
  • 896
  • 2
  • 7
  • 26
  • @zero323 do you know? – python_enthusiast Jul 22 '19 at 17:01
  • 2
    zero323 hasn't logged on to SO since October 2017. In any case, it would be easier to answer your question with a small [reproducible example](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples). However, based on what I think you're asking, the answer is going to be no because you can't do element wise math operations on spark dataframes the way you can with pandas. – pault Jul 22 '19 at 18:37

0 Answers0