0

newbie to Scala and Spark here.

I'm passing in a CSV as a dataframe, and I want to convert the fields that match a Regex expression from String to Boolean.

My solution:

val df_in = spark.read.option("header", true).csv("./data/data.csv") // load csv

val cast_in = df_in.select(
  df_in.columns.map {
    case col if col.matches("Answer\\.[aq](.*)".r.toString()) => // check regex
      functions.col(col).cast(BooleanType)
    case col => functions.col(col)
  }: _*
)

However, the problem to this is that I have a . in multiple of my column names, such as Input.html and hit this error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Input.html' given input columns:

I've read a few solutions saying that I need to use a backtick to group both sides of the . together, but I'm not too sure if that's possible with my approach since I'm using variables.

Any guidance is appreciated!

0 Answers0