newbie to Scala and Spark here.
I'm passing in a CSV as a dataframe, and I want to convert the fields that match a Regex expression from String to Boolean.
My solution:
val df_in = spark.read.option("header", true).csv("./data/data.csv") // load csv
val cast_in = df_in.select(
df_in.columns.map {
case col if col.matches("Answer\\.[aq](.*)".r.toString()) => // check regex
functions.col(col).cast(BooleanType)
case col => functions.col(col)
}: _*
)
However, the problem to this is that I have a .
in multiple of my column names, such as Input.html
and hit this error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Input.html' given input columns:
I've read a few solutions saying that I need to use a backtick to group both sides of the .
together, but I'm not too sure if that's possible with my approach since I'm using variables.
Any guidance is appreciated!