I have an RDD that I've converted into a Spark SQL DataFrame. I want to do a number of transformations of columns with UDFs, which ends up looking something like this:
df = df.withColumn("col1", udf1(df.col1))\
.withColumn("col2", udf2(df.col2))\
...
...
.withColumn("newcol", udf(df.oldcol1, df.oldcol2))\
.drop(df.oldcol1).drop(df.oldcol2)\
...
etc.
Is there is a more concise way to express this (both the repeated withColumn
and drop
calls)?