I am having issues putting together a query that filters an existing dataframe to show the count of names that are the same for both male and females.
We assigned a name to be female if the number of women and men were equal. Write a filter based on the df_ssa5 DataFrame to count and print out how many times this occurs and how many names there are in total.
With df_ssa5 being a given dataframe;
df_ssa5 = df_ssa4.groupBy("name").sum("F","M").withColumnRenamed("sum(F)","women").withColumnRenamed("sum(M)","men")
df_ssa5.show()
If anyone could help that'd be great.
The desired output would be something like this, however with the same names and the count of time the names shows up that are both for men and women: