0

After setting up an AWS EC2 Linux Server I installed Anaconda, Spark on Hadoop as described in the following lecture:

https://medium.com/@josemarcialportilla/getting-spark-python-and-jupyter-notebook-running-on-amazon-ec2-dec599e1c297

After launching several new EC2 instances I still can not solve the problem in Jupyter Notebook:

from pyspark import SparkContext
sc = SparkContext()


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-47c4965c5f0e> in <module>()
----> 1 from pyspark import SparkContext

~/spark-2.0.0-bin-hadoop2.7/python/pyspark/__init__.py in <module>()
     42 
     43 from pyspark.conf import SparkConf
---> 44 from pyspark.context import SparkContext
     45 from pyspark.rdd import RDD
     46 from pyspark.files import SparkFiles

~/spark-2.0.0-bin-hadoop2.7/python/pyspark/context.py in <module>()
     35     PairDeserializer, AutoBatchedSerializer, NoOpSerializer
     36 from pyspark.storagelevel import StorageLevel
---> 37 from pyspark.rdd import RDD, _load_from_socket, ignore_unicode_prefix
     38 from pyspark.traceback_utils import CallSite, first_spark_call
     39 from pyspark.status import StatusTracker

~/spark-2.0.0-bin-hadoop2.7/python/pyspark/rdd.py in <module>()
     45 from pyspark.join import python_join, python_left_outer_join, \
     46     python_right_outer_join, python_full_outer_join, python_cogroup
---> 47 from pyspark.statcounter import StatCounter
     48 from pyspark.rddsampler import RDDSampler, RDDRangeSampler, RDDStratifiedSampler
     49 from pyspark.storagelevel import StorageLevel

~/spark-2.0.0-bin-hadoop2.7/python/pyspark/statcounter.py in <module>()
     22 
     23 try:
---> 24     from numpy import maximum, minimum, sqrt
     25 except ImportError:
     26     maximum = max

~/anaconda3/lib/python3.6/site-packages/numpy/__init__.py in <module>()
    140         return loader(*packages, **options)
    141 
--> 142     from . import add_newdocs
    143     __all__ = ['add_newdocs',
    144                'ModuleDeprecationWarning',

~/anaconda3/lib/python3.6/site-packages/numpy/add_newdocs.py in <module>()
     11 from __future__ import division, absolute_import, print_function
     12 
---> 13 from numpy.lib import add_newdoc
     14 
     15 ###############################################################################

~/anaconda3/lib/python3.6/site-packages/numpy/lib/__init__.py in <module>()
      6 from numpy.version import version as __version__
      7 
----> 8 from .type_check import *
      9 from .index_tricks import *
     10 from .function_base import *

~/anaconda3/lib/python3.6/site-packages/numpy/lib/type_check.py in <module>()
      9            'common_type']
     10 
---> 11 import numpy.core.numeric as _nx
     12 from numpy.core.numeric import asarray, asanyarray, array, isnan, zeros
     13 from .ufunclike import isneginf, isposinf

AttributeError: module 'numpy' has no attribute 'core'

Thank you for your help

MDO
  • 1
  • Did you try upgrading numpy package? – charles gomes Oct 01 '17 at 16:57
  • Yes it says unable to locate package numpy – MDO Oct 01 '17 at 17:35
  • The version is 1.13.1 and it is located in the home/ubuntu/anaconda3/lib/python3.6/site-packages directory – MDO Oct 01 '17 at 17:40
  • @charlesgomes Thank you for your answer. $ sudo apt-get up date numpy E: The update command takes no arguments ubuntu@ip-172-31-:~/anaconda3/lib/python3.6/site-packages$ /home/ubuntu/an aconda3/lib/python3.6/site-packages -bash: /home/ubuntu/anaconda3/lib/python3.6/site-packages: Is a directory and $ sudo apt-get install numpy didn't work. The color of numpy is blue, some other packages are white and green. I am not sure what I have missed – MDO Oct 01 '17 at 18:11
  • The spark version was spark-2.0.0 instead of the newer 2.2.0 and it worked after installing spark 2.2.0 and updating the spark home directory. Thank you for your help @charlesgomes – MDO Oct 01 '17 at 19:01

0 Answers0