1

Trying to pull the product name from a page:

https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html

Can't find XPATH which returns useful, specific result.

Apologies for my first post being such a beginner question :(

class V12Spider(scrapy.Spider):
name = 'v12'
start_urls = ['https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html']


def parse(self, response):
    yield {
        'price' : response.xpath('//span[@id="product-price-26901"]/text()'),
        'name' : response.xpath('//h3[@class="product-name"]/a/text()'),
           }

for name, I expected to produce the name from items in h3 tags with class class product-name but generates multiple rows of data='\r\n

(whilst we're at it for price, is there any way to only pull the numerical values out?)

PRERNA PAL
  • 381
  • 2
  • 10
Lillap
  • 13
  • 3

1 Answers1

1

The problem you are facing can be solved using get() method for xpath and then using strip() method for string. I tried something like this

name= response.xpath('//h3[@class="product-name"]/a/text()').get()

Gives

'\r\n                                RED CHILLI VOLTAGE                            '

Then using

name.strip()

gives

'RED CHILLI VOLTAGE'

So you can replace your name statement with

name= response.xpath('//h3[@class="product-name"]/a/text()').get().strip()

Same solution to get price just add .get().strip at the end of your statement

Hopefully this helps. Also read about .getall() method from https://docs.scrapy.org/en/latest/topics/selectors.html

glory9211
  • 741
  • 7
  • 18
  • Thank you so much for that - was getting confused no end, despite knowing it was simple! – Lillap Sep 03 '19 at 11:13
  • The xpath/css selectors are used to get the elements and we call .get() or .getall() or extract_first() to get the data inside (read difference in documentation). It's just a common mistake many people forgetfully do ;) – glory9211 Sep 03 '19 at 11:17
  • On to the next dumb question ... def parse(self, response): for shoe in response.css('.item'): yield { 'name' : shoe.xpath('//h3[@class="product-name"]/a/text()').().strip(), 'price' : shoe.xpath('//p[@class="special-price"]/span[@id="product-price-26901"]/text()').get().strip(), } I was expecting it to return data for the 12 items on the page, instead get 12 x 1st entry ... What am I missing about the reiteration?! Sorry again for being so rubbish and missing the obvious! – Lillap Sep 03 '19 at 15:10
  • Or can I use strip() with getall() ? It seems to all be going wrong! – Lillap Sep 03 '19 at 15:29
  • I think this is because you are combining css and xpath selectors as answered here https://stackoverflow.com/questions/9005170/css-selector-inside-xpath though *I might be wrong* since I am not proficient in using xpaths – glory9211 Sep 03 '19 at 18:47
  • But the fix to your problem using only css selectors is. . . . . ``` def parse(self, response): for shoe in response.css('.item'): yield {'name': shoe.css('.product-name a::text').get().strip(), 'price': shoe.css('.special-price .price::text').get().strip() }``` – glory9211 Sep 03 '19 at 18:48
  • Thank you so much for all your assistance, really ver appreciated. – Lillap Sep 05 '19 at 08:08