How to turn off scientific notation in pyspark?

Question

As the result of some aggregation i come up with following sparkdataframe:

 ------------+-----------------+-----------------+
|sale_user_id|gross_profit     |total_sale_volume|
+------------+-----------------+-----------------+
|       20569|       -3322960.0|     2.12569482E8|
|       24269|       -1876253.0|      8.6424626E7|
|        9583|              0.0|       1.282272E7|
|       11722|          18229.0|        5653149.0|
|       37982|           6077.0|        1181243.0|
|       20428|           1665.0|        7011588.0|
|       41157|          73227.0|        1.18631E7|
|        9993|              0.0|        1481437.0|
|        9030|           8865.0|      4.4133791E7|
|         829|              0.0|          11355.0|
+------------+-----------------+-----------------+

and the schema of the dataframe is :

root
 |-- sale_user_id: string (nullable = true)
 |-- tapp_gross_profit: double (nullable = true)
 |-- total_sale_volume: double (nullable = true)

how can i disable scientific notation in each of gross_profit and total_sale_volume columns?

score 23 · Accepted Answer · answered Nov 01 '16 at 20:26

23

The easiest way is to cast double column to decimal, giving appropriate precision and scale:

df.withColumn('total_sale_volume', df.total_sale_volume.cast(DecimalType(18, 2)))

answered Nov 01 '16 at 20:26

Mariusz

13,481
3
60
64

Any idea on how to do that without informing the number of decimal places (exponents)? I mean, make it to be inferred? – Bruno Ambrozio Nov 09 '20 at 17:37
@BrunoAmbrozio You can always `.collect()` a dataframe, and then you have a pure python objects with more control on how these are printed (https://stackoverflow.com/questions/658763/how-to-suppress-scientific-notation-when-printing-float-values) – Mariusz Nov 10 '20 at 18:26
1

Right now I need pretty much the same but for persisting the values in a file, however, I cannot set the precision. Appreciate if someone has a solution. Here's the new question: https://stackoverflow.com/questions/64772851/how-to-load-big-double-numbers-in-a-pyspark-dataframe-and-persist-it-back-withou/64773207#64773207 – Bruno Ambrozio Nov 10 '20 at 18:56
1

DecimalType is also subject to scientific notation, depending on the precision and scale. – sabacherli Oct 14 '21 at 13:42

score -4 · Answer 2 · edited Oct 25 '21 at 17:13

-4

DecimalType is deprecated in spark 3.0+

If it is stringtype, cast to Doubletype first then finally to BigInt type. No need to set precision:

df.withColumn('total_sale_volume', df.total_sale_volume.cast(StringType).cast(BigIntType))

or alternatively without having to import:

df.withColumn('total_sale_volume', df.total_sale_volume.cast('string').cast('bigint'))

edited Oct 25 '21 at 17:13

Tomerikoo

18,379
16
47
61

answered Oct 25 '21 at 15:08

navin.senguttuvan

1
1

DecimalType isn't deprecated in spark 3.0+ – Tim Gautier Feb 24 '22 at 19:25
see [this](https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.types.DecimalType.html) for DecimalType in spark 3.0+ – samkart Jun 21 '22 at 07:22

How to turn off scientific notation in pyspark?

2 Answers2

Linked

Related