1

I am attempting to run my own scrapy project. The code is based off a well written book and the author provides a great VM playground to run scripts exampled in the book. In the VM the code works fine. However, in an attempt to practice on my own, I received the following error:

  File "(frozen importlib._bootstrap)", line 978, in _gcd_import
  File "(frozen importlib._bootstrap)", line 961, in _find_and_load
  File "(frozen importlib._bootstrap)", line 950, in _find_and_load_unlocked
  File "(frozen importlib._bootstrap)", line 655, in _load_unlocked
  File "(frozen importlib._bootstrap_external)", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "C:\users\me\dictionary_com\spiders\basic.py", line 3, in <module>
    import urlparse
ModuleNotFoundError: No module named 'urlparse'

I initially had Python 3 running on my main external machine (outside the VM), and it seems as though the author was using Python 2 (still don't know how Atom editor's flake 8 was making sense of this?). Upon review of Python 2/3 issues with urllib, (python 2 and 3 extract domain from url and Heroku logs say "No module named 'urlparse'" when I use import urlparse and https://github.com/FriendCode/gittle/issues/49) I tried the various import solutions provided in these links. I installed Python 2.7 (and verified that it is set to path by $python -V -->python2.7.13. I even tried creating a conda enviroment to make sure it was pulling python2.7.13.

My spider.py script is as follows:

import datetime
import urlparse
import socket
import scrapy

from Terms.items import TermsItem
# you have to import processors from scrapy.loader to use it
from scrapy.loader.processors import MapCompose, Join
# you have to import Itemloader from scrapy.loader to use it
from scrapy.loader import ItemLoader


class BasicSpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = [i.strip() for i in open('lused.urls.txt').readlines()]

    def parse(self, response):
        l = ItemLoader(item=TermsItem(), response=response)

        # Load fields using XPath expressions
        l.add_xpath('term', '//h1[@class="head-entry"][1]/text()',
                    MapCompose(unicode.strip, unicode.title))
        l.add_xpath('definition', '//*[@class="def-list"][1]/text()',
                    MapCompose(unicode.strip, unicode.title))
        # Housekeeping fields
        l.add_value('url', response.url)
        l.add_value('project', self.settings.get('BOT_NAME'))
        l.add_value('spider', self.name)
        l.add_value('server', socket.gethostname())
        l.add_value('date', datetime.datetime.now())

        return l.load_item()

My item.py script is as follows:

from scrapy.item import Item, Field


class TermsItem(Item):
    # Primary fields
    term = Field()
    definition = Field()
    # Housekeeping fields
    url = Field()
    project = Field()
    spider = Field()
    server = Field()
    date = Field()

In atom editor, the flake8 python checker flags/underlines (adducing:'imported but not used):

'import urlparse'

'from scrapy.loader.processors import MapCompose, Join'

However, when I open the virtually identical code used in the author's provided VM in Atom editor, it doesn't flag anything...and the code runs!!??

Unfortunately, I am left with the same error result after trying the above solution attempts. I was hoping someone else encountered this problem or can spot my error based on the above details.

Community
  • 1
  • 1
R.Zane
  • 350
  • 1
  • 6
  • 16
  • 1
    The `frozen importlib._bootstrap` parts in the traceback means it is running in Python 3, not Python 2. – Martijn Pieters Sep 07 '17 at 21:22
  • Thanks so much for the prompt reply. I will try uninstalling Python 3 altogether. Do you have any idea why it is running Python 3? When I type: $python -V in the terminal it gives me: python 2.7.13 (after I installed python 2). I tried removing scrapy using conda and reinstalling it but I still have the same issue. – R.Zane Sep 07 '17 at 21:38
  • You didn't show how you are running the code. – Martijn Pieters Sep 07 '17 at 21:38
  • I am running the code (if I understand correctly): $scrapy crawl basic.py. I then attempted using a conda enviroment using these resources: https://conda.io/docs/user-guide/tasks/manage-environments.html and https://www.youtube.com/watch?v=KmSZ5itfXmg. – R.Zane Sep 07 '17 at 21:45
  • 1
    So the `scrapy` command is installed tied to the Python 3 environment. I'm not familiar enough with Scrapy to tell you how to install that differently however. – Martijn Pieters Sep 07 '17 at 21:46
  • Ok. I am still so thankful that you set me on the right route to solve the problem. I will post the solution when I find it. Thanks again....really appreciate it. – R.Zane Sep 07 '17 at 21:55
  • I (partially) solved the problem, with the appreciated direction from Martijn Pietters. As I noted above, [link]conda.io/docs/user-guide/tasks/manage-environments.html and [link]https://www.youtube.com/watch?v=KmSZ5itfXmg It seems that I need to use the Anaconda terminal to create a conda environment. I was initially using the powershell terminal and I suspect that I need to add to the PATH to use in powershell. – R.Zane Sep 08 '17 at 01:30

1 Answers1

0

I (partially) solved the problem, with the appreciated direction from Martijn Pietters. Scrapy was extending my system installed Python --Python 3.6-- instead of my conda environment python --Python 2.7. Based on conda.io/docs/user-guide/tasks/manage-environments.htm‌​l and miniconda installation documentation, adding miniconda to the PATH can cause the miniconda environment to be called ahead of other environments (if I understand correctly). It seems that I need to use the Anaconda terminal to create a conda environment. I was initially using the powershell terminal (thinking that adding miniconda onto the path was sufficient). Hopefully I have explained this proficiently enough to have others avoid my mistake.

Regards,

R.Zane
  • 350
  • 1
  • 6
  • 16