13

As the result of some aggregation i come up with following sparkdataframe:

 ------------+-----------------+-----------------+
|sale_user_id|gross_profit     |total_sale_volume|
+------------+-----------------+-----------------+
|       20569|       -3322960.0|     2.12569482E8|
|       24269|       -1876253.0|      8.6424626E7|
|        9583|              0.0|       1.282272E7|
|       11722|          18229.0|        5653149.0|
|       37982|           6077.0|        1181243.0|
|       20428|           1665.0|        7011588.0|
|       41157|          73227.0|        1.18631E7|
|        9993|              0.0|        1481437.0|
|        9030|           8865.0|      4.4133791E7|
|         829|              0.0|          11355.0|
+------------+-----------------+-----------------+

and the schema of the dataframe is :

root
 |-- sale_user_id: string (nullable = true)
 |-- tapp_gross_profit: double (nullable = true)
 |-- total_sale_volume: double (nullable = true)

how can i disable scientific notation in each of gross_profit and total_sale_volume columns?

ZygD
  • 22,092
  • 39
  • 79
  • 102
chessosapiens
  • 3,159
  • 10
  • 36
  • 58

2 Answers2

23

The easiest way is to cast double column to decimal, giving appropriate precision and scale:

df.withColumn('total_sale_volume', df.total_sale_volume.cast(DecimalType(18, 2)))
Mariusz
  • 13,481
  • 3
  • 60
  • 64
  • Any idea on how to do that without informing the number of decimal places (exponents)? I mean, make it to be inferred? – Bruno Ambrozio Nov 09 '20 at 17:37
  • @BrunoAmbrozio You can always `.collect()` a dataframe, and then you have a pure python objects with more control on how these are printed (https://stackoverflow.com/questions/658763/how-to-suppress-scientific-notation-when-printing-float-values) – Mariusz Nov 10 '20 at 18:26
  • 1
    Right now I need pretty much the same but for persisting the values in a file, however, I cannot set the precision. Appreciate if someone has a solution. Here's the new question: https://stackoverflow.com/questions/64772851/how-to-load-big-double-numbers-in-a-pyspark-dataframe-and-persist-it-back-withou/64773207#64773207 – Bruno Ambrozio Nov 10 '20 at 18:56
  • 1
    DecimalType is also subject to scientific notation, depending on the precision and scale. – sabacherli Oct 14 '21 at 13:42
-4

DecimalType is deprecated in spark 3.0+

If it is stringtype, cast to Doubletype first then finally to BigInt type. No need to set precision:

df.withColumn('total_sale_volume', df.total_sale_volume.cast(StringType).cast(BigIntType))

or alternatively without having to import:

df.withColumn('total_sale_volume', df.total_sale_volume.cast('string').cast('bigint'))
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61