-1

I have a pyspark dataset in which there's one column named as numerical data. I calculated this dataset from somewhere else. Example:

 Numerical_fields| Age | Height | Weight 

Now, I need to calculate mean for each value in this column. For this I tried Looping, for i in df.collect(): how can I get the mean?

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

To get a df with the mean of each value in Numerical_fields you can do the following:

avg_df = df.groupby(df.Numerical_fields).avg("Age", "Height", "Weight")

avg_df will now contain one line per unique value in Numerical_fields with the averages of the other columns for that value.

scr
  • 853
  • 1
  • 2
  • 14