1

I was trying something like this

val df = Seq((50984908,1000)).toDF("x","y")
val myExpression = "x * y"
df.withColumn("z",expr(myExpression)).show()

I can see it causing Integer overflows and it doesnt get cast to Long instead

+--------+----+----------+
|       x|   y|         z|
+--------+----+----------+
|50984908|1000|-554699552|
+--------+----+----------+

Can somebody please provide suggestions how these overflows can be avoided? Is there a way where Spark can automatically infer higher precision for types correctly (eg : Integer -> Long), (Float -> Double/BigDecimal)?

1 Answers1

0

In Scala, you can explicitly declare number literal as Long by adding L suffix. If you do that types would be correctly inferred as Long:

val df = Seq((50984908L,1000L)).toDF("x","y")
val myExpression = "x * y"
df.withColumn("z",expr(myExpression)).show()

If you need more control over column types, you could also use function createDataFrame:

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val someData = Seq(
  Row(50984908L, 1000L)
)

val myExpression = "x * y"

val someSchema = List(
  StructField("x", LongType, true),
  StructField("y", LongType, true)
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(someData),
  StructType(someSchema)
)

df.withColumn("z",expr(myExpression)).show()
Krzysztof Atłasik
  • 21,985
  • 6
  • 54
  • 76
  • Hello, thank you for your response. Is there a way we can get control of types if we read a csv file and store in a dataframe? `session.read.format("csv") .option("header", "true") .option("quote", dqualifier) .option("ignoreTrailingWhiteSpace", value = true) .option("multiLine", value = true) .option("inferSchema", value = infer) .load(fileLocation)`` ``` I know that inferSchema infers the schema types, but could we replace that to something more robust so that we could assign column types to the columns correctly? – Sandhya Murali Aug 13 '20 at 15:47
  • https://stackoverflow.com/questions/39926411/provide-schema-while-reading-csv-file-as-a-dataframe – Krzysztof Atłasik Aug 13 '20 at 16:28
  • If it doesn't anwser your question pleas create another question. – Krzysztof Atłasik Aug 13 '20 at 16:29
  • You can also upvote/accept this anwser if you feel it did anwser your question. – Krzysztof Atłasik Aug 13 '20 at 16:29