0

I'm using scrapy to crawl this link:

<input class="xxxmail" type="text" readonly="readonly" value="xxx.org">

I just need the "xxx.org". How do I retrieve it?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
willie
  • 1
  • 1

1 Answers1

1

You can use the following xpath expression:

//input[@class="xxxmail"]/@value

This will get the value attribute of an input tag with the "xxxmail" class.

In the spider, you should first instantiate the Selector and then extract() from the xpath:

sel = Selector(response)
print sel.xpath('//input[@class="xxxmail"]/@value').extract()
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks for your reply. I have tried your method. It doesn't work. I think the problem is not the xpath, but there is a upper level div shown as style="display: block;" – willie Apr 18 '14 at 17:09
  • @willie cannot say more without seeing the actual webpage u are crawling. – alecxe Apr 18 '14 at 17:11
  • Hi, it is like this
    – willie Apr 18 '14 at 17:16
  • 1
    @willie can you give a link to the webpage? Also, what error are you getting? Have u replaced `xxxmail` with `anonemail`? – alecxe Apr 18 '14 at 17:29
  • Hi, thanks for your reply. I don't get any errors because the crawler can't find that xpath before the button is clicked. I want to crawl this page: http://newbrunswick.en.craigslist.ca/rvs/4443347993.html There is a button "contact" only after you click it , you can see the contact's email address which I want to get. – willie Apr 28 '14 at 00:10
  • @willie the data you need is in `div` with `reply_options` class. Try `//div[@class="reply_options"]` xpath. – alecxe Apr 28 '14 at 00:34
  • Hi, I have attempted all these xpaths, that's why I believe a function to click to button is needed. Do you know how to do it? – willie Apr 29 '14 at 15:15
  • @willie you cannot make a UI click action with `Scrapy`. If you think you need to click a button, check [selenium](http://selenium-python.readthedocs.org/) tool. Hope that helps. – alecxe Apr 29 '14 at 15:16
  • Thanks. I know. But I haven't find a useful example combining scrapy and selenum. – willie Apr 29 '14 at 15:28
  • For example: http://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page – alecxe Apr 29 '14 at 15:28
  • Hi, I tried those examples but always get this warning: ScrapyDeprecationWarning: SeleniumSpider.spiders.SeleniumSpider.SeleniumSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others) class SeleniumSpider(BaseSpider) Can you write a simple scheme on how to use selenum in scrapy? Also is it possible If I use CrawlSpider? – willie May 02 '14 at 01:41