8

Within an interactive pyspark session you can import python files via sc.addPyFile('file_location'). If you need to make changes to that file and save them, is there any way to "re-broadcast" the updated file without having to shut down your spark session and start a new one?

Simply adding the file again doesn't work. I'm not sure if renaming the file works, but I don't want to do that anyways.

As far as I can tell from the spark documentation there is only a method to add a pyfile, not update one. I'm hoping that I missed something!

Thanks

Jim
  • 224
  • 1
  • 3
  • 10
  • I haven't tested this, but could you just follow http://stackoverflow.com/questions/6946376/how-to-reload-a-class-in-python-shell within your pyspark-shell? – James Tobin Mar 02 '17 at 19:51
  • Have you tried setting spark.files.overwrite to true in your spark-defaults.conf? http://spark.apache.org/docs/latest/configuration.html – sparknoob Mar 02 '17 at 21:29
  • Hmm i have not tried either of those. sparknoob yours sounds like it's exactly what I'm looking for so I'll try that first and get back to you. Thanks. – Jim Mar 02 '17 at 22:20
  • Nope, unfortunately those don't work for me. I tried both sc.addFile and sc.addPyFile, and neither of them seem to actually be overwriting the files with my changes mid-session. Any other possible solutions? – Jim Mar 03 '17 at 14:17
  • Check out https://stackoverflow.com/a/44387776/1843329, although it sounds like that solution isn't reliable in practice. – snark Mar 29 '19 at 12:42

1 Answers1

1

I don't think it's feasible during an interactive session. You will have to restart your session to use the modified module.

Wen Yao
  • 81
  • 1
  • 1