2

Im working on getting the data from tripadvisor but most of the first ones are relative date and the rest are normal MM/DD/YYYY, but with closer inspection I see that relative date has this

<span class="ratingDate relativeDate" title="20 June 2015">Reviewed 4 weeks ago
</span>

I am using this Xpath to get the data

response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat
e" or @class="ratingDate"]/text()').extract()

My question is How do I add the @title so that I can get the title which has the normal date format.

I tried

response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat
e"/@title or @class="ratingDate"]/text()').extract()

response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat
e" or @class="ratingDate"]/@title/text()').extract()
Smashed
  • 331
  • 2
  • 5
  • 20
  • Also Forgot to mention I cannot have 2 seperate Xpaths because it is hard to format it in the pipline which prints to CSV – Smashed Jul 20 '15 at 07:14
  • Why not? It is really easy to set the item's field to one of those XPath results. In this case the solution is transparent for your pipeline. – GHajba Jul 20 '15 at 07:15
  • I just rialised that I can set it to the same field until the relative runs out and then the second one takes over. Thus letting me two Xpaths. But I still cant figure out how to call the title attribute – Smashed Jul 20 '15 at 07:19
  • Figured it out, I was calling text while I shouldent have. `response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat e"]/@title').extract()` – Smashed Jul 20 '15 at 07:20
  • nevermind it need the text() which the title does not have – Smashed Jul 20 '15 at 07:40
  • So all you need is the `title` attribute, hence the problem has been solved using xpath `//div....../@title`? – har07 Jul 20 '15 at 08:05
  • Finally figured it out, thanks @GHajba,har07 – Smashed Jul 20 '15 at 08:32

1 Answers1

6

Figured it out in the spider you have to do a conditional statement that will dynamically check whether that xpath contains values or not.

Here's my rendition.

item['date'] = sel.xpath('//*[@class="ratingDate relativeDate"]/@title').extract()
item['date'] += sel.xpath('//div[@class="col2of2"]//span[@class="ratingDate"]/text()').extract()
Smashed
  • 331
  • 2
  • 5
  • 20