I want to validate if spark expression is syntactically correct or not without actually running the expression in cluster.
Below is the spark expression.
(NOT(isnull(column_name)) AND NOT(length(trim(column_name))=0) AND NOT(nvl(CAST(column_name as INT) = CAST(12345 AS INT),false)))
I have already tried with SparkSqlParser as the parser for Spark SQL expression.
But it is not able to give a complete shield for incorrect syntax on expressions.
For Ex, if I have remove the parameter(column_name) from the isnull function, still the SparkSqlParser is validating the syntax as correct one but practically it is not a correct expression syntax.
i wanted to validate the syntax of the expression before executing against the dataset. for ex. while executing datasetObj.selectExpr(nvl(length(column_Name)) as Rule1).
While executing this code spark will throw an exception as "org.apache.spark.sql.AnalysisException: Invalid number of arguments for function nvl. Expected:2; Found 1,"
This is because the spark inbuilt function nvl should have two arguments. So i wanted to validate the syntax of each functions in an expression before the execution against the dataset.