2

I want to use delta lake on Hadoop cluster using pyspark. I haven't found any installation guide to use delta lake apart from below.

pyspark --packages io.delta:delta-core_2.11:0.1.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

I have 2 questions :

  • What's latest version of delta lake (<0.7) compatible with Apache spark 2.4.3 ? I know it should be 2.11 scala version.
  • How to install delta lake package on Hadoop cluster ?

Thanks in advance.

blackbishop
  • 30,945
  • 11
  • 55
  • 76
P. Phalak
  • 457
  • 1
  • 4
  • 11
  • For compatibility, take a look at the page https://docs.delta.io/latest/releases.html. Seems like versions <0.7.0 are compatible with Spark 2.4.4+, and 0.7.0 - with Spark 3.0 – Rayan Ral Aug 13 '20 at 09:01
  • I just added a detailed answer on how to install PySpark & Delta Lake that will help readers: https://stackoverflow.com/a/72455808/1125159 – Powers Jun 01 '22 at 02:19

1 Answers1

0

The latest version of Delta that supports Spark 2.4.3 is 0.6.1 (github branch), use --packages io.delta:delta-core_2.11:0.6.1 and it should work out of box.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thank you for conforming delta lake version. I will be using delta lake on Hadoop cluster via pyspark . Is pyspark --packages install package ? My understanding is pyspark --packages will start interactive session with specified packages . – P. Phalak Aug 18 '20 at 12:28
  • `--packages` pulls specified package & make it available to the code (and cache it). You can also use it with `spark-submit` to start non-interactive jobs, etc. – Alex Ott Aug 18 '20 at 14:35
  • @AlexOtt Is there any way by which we can skip the --packages part. Like if its possible to add the JARS to some PATH so that it automatically picks the libraries when spark starts – ASHISH M.G Sep 16 '20 at 16:40
  • 1
    yes, you can put `spark.jars.packages` + list of packages into `spark-defaults.conf` – Alex Ott Sep 16 '20 at 16:43