Using Apache Spark 2.0.2 I have a table stored as parquet which contains about 23 millions rows and about 300 columns. I have a column called total_price stored as double, if I execute:
select sum(total_price) from my_table;
+-----------------+
| total_price |
+-----------------+
| 9.3923769592E8|
+-----------------+
So this number 9.3923769592E8 is wrong.
but if I execute:
select year, sum(total_price) from my_table;
+-------+------------------------+
| year| total_price|
+-------+------------------------+
| 2017| 44510982.10004025 |
| 2016| 293320440.63992333 |
| 2015| 311512575.890131 |
| 2014| 289885757.2799143 |
| 2013| 5192.319 |
| 2012| 2747.7000000000007|
+-------+------------------------+
My assumption is that on the first query the double data type has an overflow or something like it.
Why I'm getting the result with so many decimals after the dot if they are stored as #.##?
How can I fix the error of the first query?