0

Can we use recursive Common Table Expressions (CTEs) in Spark SQL and do the recursion in pyspark?

If not, is there a way we can do this using Python (i.e. Pandas or UDFs) in the distributed storage environment of Spark? I know a recursion-implementing solution but it uses Scala, which is not used in my project.

landau
  • 41
  • 1
  • 6
  • according to official doc, it's gonna be available in spark 3 [link](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-qry-select-cte.html) – Steven Dec 12 '19 at 10:33
  • Does this answer your question? [recursive cte in spark SQL](https://stackoverflow.com/questions/52562607/recursive-cte-in-spark-sql) – blackbishop Dec 12 '19 at 10:53
  • Thanks, @Steven. That helped me stop running impatiently for an answer in Spark. – landau Dec 13 '19 at 03:26
  • Thanks, @blackbishop. Now I wonder what recursive implementations can be done using Python helped by Spark. To take advantage of Spark, I want to change input data to PandasDF, partition it into groups. Then use recursion on each group at workers. Finally combine the PandasDFs and convert them into SparkDF. But I'm not sure this will work. – landau Dec 13 '19 at 03:35

0 Answers0