Use scrapy to crawl a property within a block

Question

I'm using scrapy to crawl this link:

<input class="xxxmail" type="text" readonly="readonly" value="xxx.org">

I just need the "xxx.org". How do I retrieve it?

score 1 · Answer 1 · answered Apr 18 '14 at 16:56

1

You can use the following xpath expression:

//input[@class="xxxmail"]/@value

This will get the value attribute of an input tag with the "xxxmail" class.

In the spider, you should first instantiate the Selector and then extract() from the xpath:

sel = Selector(response)
print sel.xpath('//input[@class="xxxmail"]/@value').extract()

answered Apr 18 '14 at 16:56

alecxe

462,703
120
1,088
1,195

Thanks for your reply. I have tried your method. It doesn't work. I think the problem is not the xpath, but there is a upper level div shown as style="display: block;" – willie Apr 18 '14 at 17:09
@willie cannot say more without seeing the actual webpage u are crawling. – alecxe Apr 18 '14 at 17:11
Hi, it is like this
– willie Apr 18 '14 at 17:16
1

@willie can you give a link to the webpage? Also, what error are you getting? Have u replaced `xxxmail` with `anonemail`? – alecxe Apr 18 '14 at 17:29
Hi, thanks for your reply. I don't get any errors because the crawler can't find that xpath before the button is clicked. I want to crawl this page: http://newbrunswick.en.craigslist.ca/rvs/4443347993.html There is a button "contact" only after you click it , you can see the contact's email address which I want to get. – willie Apr 28 '14 at 00:10
@willie the data you need is in `div` with `reply_options` class. Try `//div[@class="reply_options"]` xpath. – alecxe Apr 28 '14 at 00:34
Hi, I have attempted all these xpaths, that's why I believe a function to click to button is needed. Do you know how to do it? – willie Apr 29 '14 at 15:15
@willie you cannot make a UI click action with `Scrapy`. If you think you need to click a button, check [selenium](http://selenium-python.readthedocs.org/) tool. Hope that helps. – alecxe Apr 29 '14 at 15:16
Thanks. I know. But I haven't find a useful example combining scrapy and selenum. – willie Apr 29 '14 at 15:28
For example: http://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page – alecxe Apr 29 '14 at 15:28
Hi, I tried those examples but always get this warning: ScrapyDeprecationWarning: SeleniumSpider.spiders.SeleniumSpider.SeleniumSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others) class SeleniumSpider(BaseSpider) Can you write a simple scheme on how to use selenum in scrapy? Also is it possible If I use CrawlSpider? – willie May 02 '14 at 01:41

Use scrapy to crawl a property within a block

1 Answers1