2

I want to validate if spark expression is syntactically correct or not without actually running the expression in cluster.
Below is the spark expression.
(NOT(isnull(column_name)) AND NOT(length(trim(column_name))=0) AND NOT(nvl(CAST(column_name as INT) = CAST(12345 AS INT),false)))

I have already tried with SparkSqlParser as the parser for Spark SQL expression.
But it is not able to give a complete shield for incorrect syntax on expressions.
For Ex, if I have remove the parameter(column_name) from the isnull function, still the SparkSqlParser is validating the syntax as correct one but practically it is not a correct expression syntax.

i wanted to validate the syntax of the expression before executing against the dataset. for ex. while executing datasetObj.selectExpr(nvl(length(column_Name)) as Rule1). While executing this code spark will throw an exception as "org.apache.spark.sql.AnalysisException: Invalid number of arguments for function nvl. Expected:2; Found 1,"
This is because the spark inbuilt function nvl should have two arguments. So i wanted to validate the syntax of each functions in an expression before the execution against the dataset.

  • 1
    Why do you want to avoid running before validation? Could you add a `.limit(1)` to your spark query so as to not process too much data / wait a long time? – tjheslin1 Oct 29 '21 at 10:05
  • Does this answer your question? [Is there a way to validate the syntax of raw spark sql query?](https://stackoverflow.com/questions/48865194/is-there-a-way-to-validate-the-syntax-of-raw-spark-sql-query) – blackbishop Oct 29 '21 at 11:00
  • ...or this? [What is the use of queryExecution in spark dataframe?](https://stackoverflow.com/questions/41716295/what-is-the-use-of-queryexecution-in-spark-dataframe) – mazaneicha Oct 29 '21 at 11:53
  • i wanted to validate the syntax of the expression before executing against the dataset. for ex. while executing datasetObj.selectExpr(nvl(length(column_Name)) as Rule1). While executing this code spark will throw an exception as "org.apache.spark.sql.AnalysisException: Invalid number of arguments for function nvl. Expected:2; Found 1," This is because the spark inbuilt function nvl should have two arguments. So i wanted to validate the syntax of each functions in an expression before the execution against the dataset. – Manojkumar Kumaravel Oct 30 '21 at 13:47

0 Answers0