I am using datetime
in some Python udfs that I use in my pig
script. So far so good. I use pig 12.0 on Cloudera 5.5
However, I also need to use the pytz
or dateutil
packages as well and they dont seem to be part of a vanilla python install.
Can I use them in my Pig
udfs in some ways? If so, how? I think dateutil
is installed on my nodes (I am not admin, so how can I actually check that is the case?), but when I type:
import sys
#I append the path to dateutil on my local windows machine. Is that correct?
sys.path.append('C:/Users/me/AppData/Local/Continuum/Anaconda2/lib/site-packages')
from dateutil import tz
in my udfs.py
script, I get:
2016-08-30 09:56:06,572 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
File "udfs.py", line 23, in <module>
from dateutil import tz
ImportError: No module named dateutil
when I run my pig script.
All my other python udfs (using datetime
for instance) work just fine. Any idea how to fix that?
Many thanks!
UPDATE
after playing a bit with the python path, I am now able to
import dateutil
(at least Pig does not crash). But if I try:
from dateutil import tz
I get an error.
from dateutil import tz
File "/opt/python/lib/python2.7/site-packages/dateutil/tz.py", line 16, in <module>
from six import string_types, PY3
File "/opt/python/lib/python2.7/site-packages/six.py", line 604, in <module>
viewkeys = operator.methodcaller("viewkeys")
AttributeError: type object 'org.python.modules.operator' has no attribute 'methodcaller'
How to overcome that? I use tz in the following manner
to_zone = dateutil.tz.gettz('US/Eastern')
from_zone = dateutil.tz.gettz('UTC')
and then I change the timezone of my timestamps. Can I just import dateutil to do that? what is the proper syntax?
UPDATE 2
Following yakuza's suggestion, I am able to
import sys
sys.path.append('/opt/python/lib/python2.7/site-packages')
sys.path.append('/opt/python/lib/python2.7/site-packages/pytz/zoneinfo')
import pytz
but now I get and error again
Caused by: Traceback (most recent call last): File "udfs.py", line 158, in to_date_local File "__pyclasspath__/pytz/__init__.py", line 180, in timezone pytz.exceptions.UnknownTimeZoneError: 'America/New_York'
when I define
to_zone = pytz.timezone('America/New_York')
from_zone = pytz.timezone('UTC')
Found some hints here UnknownTimezoneError Exception Raised with Python Application Compiled with Py2Exe
What to do? Awww, I just want to convert timezones in Pig :(