I have XML file which contain all transformation that I need to run over DataFrame
using withColumn
function like below:
How I can apply it over DataFrame
.
I had a written code using Scala
ToolBox
and runTmirror
, which internally compile code and run these rules over DataFrame
. Which was working perfectly for less than 100 Columns. But now requirement has been changed and Number of columns have increased from 80 to 210 so this code is failing due StackOverflow error
. Which is open issue for Scala 2.11 (https://github.com/scala/bug/issues/10026)
So I want to use any Spark utility instead of Scala ToolBox. I have also tried to use foldLeft
but it is also giving error since I am not able to pass column function (like lit
or concat
etc.) as Column type.
XML Rule Files:
<?xml version="1.0" encoding="utf-8" ?>
- <root>
- <columns>
- <column name="col1">
- <![CDATA[ data("columnA")
]]>
</column>
- <column name="col2">
- <![CDATA[lit("ABC")
]]>
</column>
- <column name="col3">
- <![CDATA[concat(col(columnC),col(columnD))
]]>
</column>
</column>
- <column name="col4">
- <![CDATA[ regexp_replace(regexp_replace(regexp_replace(col("ColumnE"), "\\,|\\)", "") , "\\(", "-") , "^(-)$", "0").cast("double")
]]>
</column>
- <column name="col5">
- <![CDATA[ lit("")
]]>
</column>
.
.
.
.
.
</columns>
</root>
Operations that I need to use as
df.withColumn("col1",data("columnA")).withColumn("col2",lit("ABC")).withColumn("col3",concat(col(columnC), col(columnD))).withColumn("col4",regexp_replace(regexp_replace(regexp_replace(col("ColumnE"), "\\,|\\)", "") , "\\(", "-") , "^(-)$", "0").cast("double"))withColumn("col5",lit("")).........
Version that I am using:
Scala 2.11.12
Spark 2.4.3