Scrapy only returning html, without content

Question

I am trying to get the response...

<span id="BusinessDbaName" class="dataItem">TECHNO COATINGS INC</span>

scrapy is instead returning...

******name =  [u'<span id="BusinessDbaName" class="dataItem"></span>']

i.e. I am having the html returned but not the content within the tags.

Question: What would cause this, and how do I fix it?

Here is my source code:

import scrapy

class lniSpider(scrapy.Spider):
    name = "lni"
    allowed_domains = ["secure.lni.wa.gov"]
    start_urls = [
        "https://secure.lni.wa.gov/verify/Detail.aspx?UBI=602123234&SAW="
    ]


    def parse(self, response):
        for sel in response.xpath('//body'):

            name = sel.xpath('//*[@id="BusinessDbaName"]').extract()
            print ("******name = "), name

Also, if I scrape the whole page I still run into the same issue of no content in the tags. — user2822565, Dec 22 '15 at 23:48
Did you try `//*[@id="BusinessDbaName"]/text()` as a selector? — Strikeskids, Dec 22 '15 at 23:56
The data on that page is AXAJ loaded. You will not get it with scrapy, since it is not in the HTML. To get the data you should use the debugger of your browser to see what AJAX calls are made. — Klaus D., Dec 23 '15 at 00:08
@KlausD. That makes a lot more since! So I should.. 1.open browser developer tools, network tab 2.go to the target site 3.click submit button and see what XHR request is going to the server 4.simulate this XHR request in your spider — user2822565, Dec 23 '15 at 00:18
*note that supporting information was found here: http://stackoverflow.com/questions/16390257/scraping-ajax-pages-using-python — user2822565, Dec 23 '15 at 00:19

Scrapy only returning html, without content

0 Answers0