Scrapy: scraping javascript and nested url

Question

I need to scrape a webpage and I normally use scrapy. I need to follow some link that can be opened through javascript and they are nested into some < ul > and < li >.

For example:

<ul class="level1">
   <li class="closed"> <----this become "expanded" when opened
     <a href="javascript:etc...
       <ul class="level2">
         <li class="closed">
           <ul class="level3">
            <li class="track">
              <a href="this_is_the_url_that_I_want">

Now, did I need something else than scrapy (I see that Selenium is suggested) or can I use a XmlLinkExtractor? Or can I, in some ways, use the code to extract the url inside "level3"?

Thanks

EDIT: I'm trying to use selenium but I get " File "/usr/lib/pymodules/python2.7/scrapy/spiderloader.py", line 40, in load raise KeyError("Spider not found: {}".format(spider_name)) KeyError: 'Spider not found: '"

I'm naming the spider, so I don't understand what I've done wrong.

import scrapy
from selenium import webdriver

class audioSpider(scrapy.Spider):
    name = "audio"
    allowed_domains = ["http://audio.sample"]
    start_urls = ["http://audio.sample/archive-project"]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        el1 = self.driver.find_element_by_xpath('//ul[@class="level1"]/li[@class]/href')
        el1.click()
        el2 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level2"]/li[@class]/href')
        el2.click()
        el3 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level3"]/li[@class="track"]/href')
        print el3

http://stackoverflow.com/questions/13436418/simulating-clicking-on-a-javascript-link-in-python I think this has been answered here. — Carele, Jun 22 '16 at 15:10
No problem. Consider marking the question a duplicate if this was your answer ! Good luck. — Carele, Jun 22 '16 at 15:21
I read of a similar issue caused by the fact that the file did not have .py at the end of it's name. Your spider file is called mySpider.py, right ? If it's somethine else, i'm sorry but i don't know how to solve your problem — Carele, Jun 23 '16 at 12:22
Yes, it's called audioSpi.py I try to call the spider audioSpider.py but it's strange. Thanks anyway! — Lara M., Jun 23 '16 at 12:26
Yes, exactly. I've checked also into settings.py but it seems all ok — Lara M., Jun 23 '16 at 12:44
Ok, found the error: there where a "/" at the end of "scrapy crawl audio". ^^' — Lara M., Jun 23 '16 at 13:47

Scrapy: scraping javascript and nested url

0 Answers0