I have a code which is doing something like this:
from smb.SMBConnection import SMBConnection
def udf_use_smb():
use smb to do something
df = df.withColumn("XXX", udf_use_smb(col())
when I do python3 -m pip install pysmb==1.2.7
This code is running fine without any issue
but if I pip download pysmb wheel files then install it with pip3 install downloaded_pysmb.whl
the code with fail with
File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 589, in main
func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 447, in read_udfs
udfs.append(read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index=i))
File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 254, in read_single_udf
f, return_type = read_command(pickleSer, infile)
File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 74, in read_command
command = serializer._read_with_length(file)
File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/serializers.py", line 172, in _read_with_length
return self.loads(obj)
File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/serializers.py", line 458, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'smb'
I checked with the master and core node of the EMR, both way will install pysmb on all node and I can import and use it from pyspark shell. looks like it is only failing on the executors.
Any idea why pip install from pypi will work but not from local wheel for executors?
Thank you