0

I got the following result in pyspark I used the .discribe() function

+------+-------+
|    _c0|   _c1|
+------+-------+
| count|2674686|
|  mean|    0.0|
|stddev|    0.0|
|   min|      0|
|   max|      0|
+------+-------+

i'm trying my result to converting like this

+------+-------+-------+-------+-------+
|  count|  mean| stddev|   min|   max|
+------+-------+-------+-------+-------+
|2674686|   0.0|    0.0|      0|      0|
+------+-------+-------+-------+-------+

how can i solve it? If this is hard, is it possible to change columns and rows when using the discribe() function?

powpow
  • 35
  • 1
  • 5

2 Answers2

2

groupBy() and pivot functions can be used here:

your_df.groupBy().pivot("_c0").agg(first('_c1')).show()
Enayat
  • 3,904
  • 1
  • 33
  • 47
0

You can use the .pivot() function for that.

Example:

from pyspark.sql.functions import first

data_frame.pivot("_c0").agg(first('_c1')).show()
Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42