Column Multiplication in Spark

Question

I'm trying to multiply two columns in Spark. Both the columns are of type Double. The result of the multiplication between 26.0 and 0.001 is 0.026000000000000002 and not 0.0026. How do I resolve this?

>>> df.printSchema()
root
 |-- age: double (nullable = true)
 |-- name: string (nullable = true)
 |-- mul: double (nullable = false)


>>> df.withColumn('res', df['age']*df['mul']).show()
+----+--------+-----+--------------------+
| age|    name|  mul|                 res|
+----+--------+-----+--------------------+
|25.0|   Ankit|0.001|               0.025|
|22.0|Jalfaizy|0.001|               0.022|
|20.0| saurabh|0.001|                0.02|
|26.0|    Bala|0.001|0.026000000000000002|
+----+--------+-----+--------------------+

Thanks

score 1 · Accepted Answer · answered Sep 14 '20 at 16:41

1

Round to 4 decimals the column:

import pyspark.sql.functions as F
df = df.withColumn("res", F.round(F.col("res"), 4)

answered Sep 14 '20 at 16:41

JOSE DANIEL FERNANDEZ

191
1
11

score 0 · Answer 2 · answered Sep 14 '20 at 14:42

Convert It to Float :

from pyspark.sql.functions import udf,explode
from pyspark.sql.types import StringType
table_schema = StructType([
     StructField("value", FloatType(), True)])
df= spark.createDataFrame(
    [
 ( 0.026000000000000002,)       
        ],table_schema
    )
df.show()

score 0 · Answer 3 · answered Sep 17 '20 at 06:37

These are floating point errors. A simple 1.1-1.0 gives 0.10000000000000009 in Python (or in Pyspark).
You can find more information about them here or in this stackoverflow answer

Rounding off to appropriate decimal places seems to be the simple solution for this problem.

Column Multiplication in Spark

3 Answers3