val inputfile = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", "\t")
.load("data")
inputfile: org.apache.spark.sql.DataFrame = [a: string, b: bigint, c: boolean]
val outputfile = inputfile.groupBy($"a",$"b").max($"c")
Above code fails because c
is a boolean variable and aggregates cannot be applied to booleans. Is there a function in Spark that converts true
value to 1
and false
to 0
for the full column of Spark data frame.
I tried the following (Source: How to change column types in Spark SQL's DataFrame? )
val inputfile = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", "\t")
.load("data")
val tempfile =inputfile.select("a","b","c").withColumn("c",toInt(inputfile("c")))
val outputfile = tempfile.groupBy($"a",$"b").max($"c")
Following question: Casting a new derived column in a DataFrame from boolean to integer answers for PySpark but I wanted a function specifically for Scala.
Appreciate any kind of help.