0

Hello guys,

My question is about zeppelin notebook. I am new to zeppelin environment. I have a AWS account. I am working on EMR cluster. I want to use pandas and matplotlib in zeppelin environment. But, I got the error no module named pandas and matplotlib. I find this tutorial. I came to Step 8 but, i stil do not get the same problem. Zeppelin has interpreter. I try to change python path even if i am sure the path, i still got the same error. This link mentions If anyone experience about these issues, please help me.

%pyspark
import os
import numpy
import pandas
import matplotlib

print("Numpy "+numpy.__version__)
print("Pandas "+pandas.__version__)
print("Matplotlib "+matplotlib.__version__)

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-444747300595843376.py", line 367, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-444747300595843376.py", line 355, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 3, in <module>  
ImportError: No module named pandas
yssefunc
  • 91
  • 3
  • 10
  • can you share the traceback? maybe there is more clues there. Also can you share the bootstrap file? – Elad Jun 04 '18 at 07:09
  • I shared my traceback. – yssefunc Jun 04 '18 at 07:17
  • This is probably wrong python is used. please check `zeppelin.pyspark.python` in interpreter setting. You can set it as the python path which python you would like to use – zjffdu Jun 05 '18 at 03:09
  • I know interpreter setting. I mentioned above. I changed zeppelin.pyspark.python and I am sure the path, i still got the same error. – yssefunc Jun 05 '18 at 06:17
  • Can you run the following code to verify whether the python version running is correct ? `import sys print(sys.version)` – zjffdu Jun 08 '18 at 01:56
  • Hi zjffdu, I am getting this output " 2.7.13 (default, Jan 31 2018, 00:17:36) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]" – yssefunc Jun 08 '18 at 05:25

1 Answers1

0

I realized that I am pulling the path from my EC2 machine. I enter my SSH into my Master node on AWS. I install pandas and matplotib. Then run the “which python” command on that instance. After that, i copied to pyspark.python path. Finally, it worked.

yssefunc
  • 91
  • 3
  • 10