2

I want to share udfs I created in Scala with other cluster which our data scientist use with pyspark and jupyter in EMR clusters.

Is this possible? How?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Lior Baber
  • 852
  • 3
  • 11
  • 25
  • Possible duplicate of [Using a Scala UDF in PySpark](https://stackoverflow.com/questions/41780141/using-a-scala-udf-in-pyspark) – zeapo Jul 03 '17 at 09:44
  • @zeapo Don't think so as it's about sharing UDFs in Jupyter across EMR cluster that could give a feature like this. It's not possible in Spark directly *unless* people use shared `SparkSession` in Spark Thrift Server though. – Jacek Laskowski Jul 03 '17 at 09:46
  • It's not, because I want to be able to share existing function and add them to the spark catalog, instead of recreate them every time – Lior Baber Jul 03 '17 at 09:47
  • Do you want to share the same UDF across different EMR clusters (which I believe are therefore different SparkContexts)? Unless EMR _somehow_ gives you the UDF sharing feature it's not possible in Spark SQL. – Jacek Laskowski Jul 03 '17 at 09:50
  • Isn't there something similar to a shared hive metastore? Or add something to spark-default.conf file? – Lior Baber Jul 03 '17 at 09:51
  • @JacekLaskowski I believe that the OP just wants to share the code library and add to his notebook/dashboard environment – eliasah Jul 03 '17 at 14:19

1 Answers1

1

this answer indeed helps

create an uber jar, put in s3, on bootstrap action copt it from s3 to spark local jar folder and it should work

Lior Baber
  • 852
  • 3
  • 11
  • 25