+----+----+--------+
| Id | M1 | trx |
+----+----+--------+
| 1 | M1 | 11.35 |
| 2 | M1 | 3.4 |
| 3 | M1 | 10.45 |
| 2 | M1 | 3.95 |
| 3 | M1 | 20.95 |
| 2 | M1 | 25.55 |
| 1 | M1 | 9.95 |
| 2 | M1 | 11.95 |
| 1 | M1 | 9.65 |
| 1 | M1 | 14.54 |
+----+----+--------+
With the above dataframe I should be able to generate a histogram as below using the below code.
val (Range,counts) = df
.select(col("trx"))
.rdd.map(r => r.getDouble(0))
.histogram(10)
// Range: Array[Double] = Array(3.4, 5.615, 7.83, 10.045, 12.26, 14.475, 16.69, 18.905, 21.12, 23.335, 25.55)
// counts: Array[Long] = Array(2, 0, 2, 3, 0, 1, 0, 1, 0, 1)
counts
contains the number of elements in each range.
But how to get the sum of the elements, sum(trx)
, in each range like:
sumOfTrx: Array[Long] = Array(7.3,0,19.6,xx,xx,xx,xx,xx,xx,25.55)
.