0

I have a code which is doing something like this:

from smb.SMBConnection import SMBConnection

def udf_use_smb():
 use smb to do something

df = df.withColumn("XXX", udf_use_smb(col())

when I do python3 -m pip install pysmb==1.2.7 This code is running fine without any issue

but if I pip download pysmb wheel files then install it with pip3 install downloaded_pysmb.whl

the code with fail with

  File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 589, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 447, in read_udfs
    udfs.append(read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index=i))
  File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 254, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/worker.py", line 74, in read_command
    command = serializer._read_with_length(file)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/serializers.py", line 172, in _read_with_length
    return self.loads(obj)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1652108392762_0002/container_1652108392762_0002_01_000002/pyspark.zip/pyspark/serializers.py", line 458, in loads
    return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'smb'

I checked with the master and core node of the EMR, both way will install pysmb on all node and I can import and use it from pyspark shell. looks like it is only failing on the executors.

Any idea why pip install from pypi will work but not from local wheel for executors?

Thank you

milton
  • 101
  • 2
  • 11
  • where are you installing the downloaded package? make sure the executors can read from that location. – samkart May 11 '22 at 04:57
  • @samkart Hey Samkart, it is just pip install on master and all slave nodes. which get installed to the same site_package folder just like pip install from pypi – milton May 11 '22 at 12:09
  • you could try shipping the package -- see [this](https://stackoverflow.com/a/24686708/8279585) – samkart May 11 '22 at 14:27
  • yeah, I believe that will work, I am just not understanding how pip install from pypi works but not from wheels. – milton May 12 '22 at 12:44

0 Answers0