0

I am trying to get the response...

<span id="BusinessDbaName" class="dataItem">TECHNO COATINGS INC</span>

scrapy is instead returning...

******name =  [u'<span id="BusinessDbaName" class="dataItem"></span>']

i.e. I am having the html returned but not the content within the tags.

Question: What would cause this, and how do I fix it?

Here is my source code:

import scrapy

class lniSpider(scrapy.Spider):
    name = "lni"
    allowed_domains = ["secure.lni.wa.gov"]
    start_urls = [
        "https://secure.lni.wa.gov/verify/Detail.aspx?UBI=602123234&SAW="
    ]


    def parse(self, response):
        for sel in response.xpath('//body'):

            name = sel.xpath('//*[@id="BusinessDbaName"]').extract()
            print ("******name = "), name
user2822565
  • 121
  • 7
  • Also, if I scrape the whole page I still run into the same issue of no content in the tags. – user2822565 Dec 22 '15 at 23:48
  • Did you try `//*[@id="BusinessDbaName"]/text()` as a selector? – Strikeskids Dec 22 '15 at 23:56
  • 2
    The data on that page is AXAJ loaded. You will not get it with scrapy, since it is not in the HTML. To get the data you should use the debugger of your browser to see what AJAX calls are made. – Klaus D. Dec 23 '15 at 00:08
  • @KlausD. That makes a lot more since! So I should.. 1.open browser developer tools, network tab 2.go to the target site 3.click submit button and see what XHR request is going to the server 4.simulate this XHR request in your spider – user2822565 Dec 23 '15 at 00:18
  • *note that supporting information was found here: http://stackoverflow.com/questions/16390257/scraping-ajax-pages-using-python – user2822565 Dec 23 '15 at 00:19

0 Answers0