0

I'm trying to multiply two columns in Spark. Both the columns are of type Double. The result of the multiplication between 26.0 and 0.001 is 0.026000000000000002 and not 0.0026. How do I resolve this?

>>> df.printSchema()
root
 |-- age: double (nullable = true)
 |-- name: string (nullable = true)
 |-- mul: double (nullable = false)


>>> df.withColumn('res', df['age']*df['mul']).show()
+----+--------+-----+--------------------+
| age|    name|  mul|                 res|
+----+--------+-----+--------------------+
|25.0|   Ankit|0.001|               0.025|
|22.0|Jalfaizy|0.001|               0.022|
|20.0| saurabh|0.001|                0.02|
|26.0|    Bala|0.001|0.026000000000000002|
+----+--------+-----+--------------------+

Thanks

Yash
  • 27
  • 3

3 Answers3

1

Round to 4 decimals the column:

import pyspark.sql.functions as F
df = df.withColumn("res", F.round(F.col("res"), 4)
0

Convert It to Float :

from pyspark.sql.functions import udf,explode
from pyspark.sql.types import StringType
table_schema = StructType([
     StructField("value", FloatType(), True)])
df= spark.createDataFrame(
    [
 ( 0.026000000000000002,)       
        ],table_schema
    )
df.show()
Addy
  • 417
  • 4
  • 13
0

These are floating point errors. A simple 1.1-1.0 gives 0.10000000000000009 in Python (or in Pyspark).
You can find more information about them here or in this stackoverflow answer

Rounding off to appropriate decimal places seems to be the simple solution for this problem.

Cena
  • 3,316
  • 2
  • 17
  • 34