0

I am trying to solve the following problem on databricks (on Azure): I essentially want to analyze the physical plan of a query before it's execution. The idea is essentially that if the physical plan does contain a certain path, I want to fail the query execution. I need to analyze the Physical Plan and not the Logical Plan, as I want to block commands that read from a certain path. However when I use spark.read.parquet(path) the path doe not show up in the Logical Plan but does show up in the physical plan. Further, I cannot use access restrictions as I want to block this only for certain clusters in a databricks workspace and not for all clusters.

I found the QueryExecutionListener which can be extended to create a custom class and override the functions onSuccess and onFailure. However these functions are only executed post the success/failure of the query and thus doesn't suit my case. Alternatively I found that we can extend the Rule class from org.apache.spark.sql.catalyst.rules.Rule and override the apply function. However, in this scenario I can only analyze the Logical Plan and not the Physical Plan.

sator_aa
  • 1
  • 1
  • I think you need to extend SparkSessionExtension class, and `injectPlannerStrategy()` method can let you intercept physical plans. – Guoran Yun Feb 01 '23 at 10:00
  • hi @sator_aa , could you provide more information about code . what you tried and expecting. – B. B. Naga Sai Vamsi Feb 15 '23 at 10:27
  • hi @SaiVamsi. I have found a way to do the above, via overrriding the onJobStart() function of the SparkListener class. Have documented it, as part of another query I had over here: https://stackoverflow.com/questions/75373428/how-can-i-extract-the-filescan-location-or-filesourcescanexec-objects-from-a-s – sator_aa Feb 21 '23 at 10:52

0 Answers0