I have two dataframes that come from doing a crosstab, and I want to divide one by the other in pyspark. The idea is that I have two frequency dataframes and I want to get a percentage. How can this be done?
from pyspark.sql.functions import col, struct
a = joined_df.crosstab("B", "C")
b = joined_df.withColumn("AB", struct("A", "B")).crosstab("AB", "C")
print("Check that cross tabs have same number of rows: ", a.count(), b.count())
I wanna do
c = a/b