Let's say I have the next dataframe:
val df = spark.sparkContext.parallelize(Seq(
("A", "12", 50),
("A", "13", 100),
("A", "14", 30),
("B", "15", 40),
("C", "16", 60),
("C", "17", 70)
)).toDF("Name", "Time", "Value")
and I pivot by 'Time':
val pivoted = df.groupBy($"Name").
pivot("Time").
agg(coalesce(sum($"Value"),lit(0)))
pivoted.show()
which results in:
+----+----+----+----+----+----+----+
|Name| 12| 13| 14| 15| 16| 17|
+----+----+----+----+----+----+----+
| B|null|null|null| 40|null|null|
| C|null|null|null|null| 60| 70|
| A| 50| 100| 30|null|null|null|
+----+----+----+----+----+----+----+
Until then, everything is ok. What I want, is to add a column next to 'column 17' calculating the sum of each row. So the expected output should be:
+----+----+----+----+----+----+----+----+
|Name| 12| 13| 14| 15| 16| 17|sum |
+----+----+----+----+----+----+----+----+
| B|null|null|null| 40|null|null|40 |
| C|null|null|null|null| 60| 70|130 |
| A| 50| 100| 30|null|null|null|180 |
+----+----+----+----+----+----+----+----+
(Noobly, )I've tried adding a 'withColumn', but it fails:
val pivotedWithSummation = df.groupBy($"Name").
pivot("Time").
agg(coalesce(sum($"Value"),lit(0))).
withColumn("summation", sum($"Value"))
I came with this answer, but I couldn't apply it :/
I'm using Scala v.2.11.8 and Spark 2.3.1
Thanks in advance!