8

I am following this solution from one of the stack overflow post, my only requirement here is how can I limit the values that I want to sum to 2 digit after the decimal before applying the df.agg(sum()) function?

For examples: I have values like below and the sum function sums it,

2.346
1.549

However I want the values to be rounded to 2 digit after the decimal like

2.35
1.55

before summing it. How can I do it? I was not able to find any sub function like sum().round of function sum.

Note: I am using Spark 1.5.1 version.

zero323
  • 322,348
  • 103
  • 959
  • 935
Explorer
  • 1,491
  • 4
  • 26
  • 67

2 Answers2

16

You can use bround:

val df = Seq(2.346, 1.549).toDF("A")
df.select(bround(df("A"), 2)).show
+------------+
|bround(A, 2)|
+------------+
|        2.35|
|        1.55|
+------------+


df.agg(sum(bround(df("A"), 2)).as("appSum")).show
+------------------+
|            appSum|
+------------------+
|3.9000000000000004|
+------------------+
                                          ^
df.agg(sum(df("A")).as("exactSum")).show
+--------+
|exactSum|
+--------+
|   3.895|
+--------+
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    Hi @Psidom it looks like bround is available from spark 2.0 version is there anything similar available in 1.5.1 version? – Explorer Jan 17 '17 at 20:31
  • It seems that [round](https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/functions.html#round(org.apache.spark.sql.Column,%20int)) is the a more universal version and available since 1.5.0. You can give it a try. Not sure why there are two functions doing the same thing though. – Psidom Jan 17 '17 at 20:37
  • @Psidom: Actually they're not the same. `bround` uses *HALF_EVEN rounding mode* while `round` uses *HALF_UP rounding mode* – Markus Aug 05 '20 at 11:04
5

The above solution does work for spark 2.0 version however for folks like me who are still using 1.5.*+ versions below is something that will work.(I used round function as suggested by @Psidom):

val df = Seq(2.346, 1.549).toDF("A")
df.select(bround(df("A"), 2)).show
+------------+
|bround(A, 2)|
+------------+
|        2.35|
|        1.55|
+------------+

val total=df.agg(sum(round(df.col(colName),2)).cast("double")).first.getDouble(0)
total: Double = 3.90
Explorer
  • 1,491
  • 4
  • 26
  • 67