1

I have a package initialized with setuptools loaded in pyspark with a setup.py file like:

setup(
  name='mypackagename',
  version='0.9.10',
  ...

This package installs/ships, loads, and runs its code fine on the spark cluster. However, I am unable to log the version via get the package version within the package. This works fine if I load the library in normal python, but when the library is loaded in pyspark pkg_resources.get_distribution('mypackagename') raises DistributionNotFound exception.

Is there a better way to get the package version in pyspark?

Scott Willeke
  • 8,884
  • 1
  • 40
  • 52
  • That's strange. Judging by the [source](https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py), `get_distribution()` should never return `None`. An old version? Maybe [stepping through the code](https://stackoverflow.com/questions/31245083/how-can-pyspark-be-called-in-debug-mode) shows something. – ivan_pozdeev Feb 04 '18 at 23:54
  • @ivan_pozdeev You're right, it was throwing `DistributionNotFound` and I had it in a try/catch with my variable initialized to `None` by default. I updated to clarify. Thanks! – Scott Willeke Feb 05 '18 at 02:12
  • "Error should never pass silently." (c) Python Zen. You should only handle an exception if you can do something intelligent about it; otherwise, it's a recipe for disaster as you just saw. – ivan_pozdeev Feb 05 '18 at 02:32
  • I agree with your lecture. When running tests locally handling the exception and returning 'unknown' _seemed_ intelligent at the time. Anyway, it does log the fact that there is an exception now. Back to the original question: Any ideas why this exception occurs only in spark? – Scott Willeke Feb 05 '18 at 02:35
  • thank you @ivan_pozdeev, but that doesn't seem to be the same. the library executes lots of code for fine, just the version i'm struggling to get to. – Scott Willeke Feb 05 '18 at 06:49

0 Answers0