how to add non-ascii characters in Xpath, in Scrappy

Question

I have the following Xpath:

bathroom = response.xpath(“.//div[1][contains(., 'Baños’)]/text()").extract_first()

And I get this error:

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

I've tried the solutions given in these other similar questions:

Filtering out certain bytes in python

Scrapy xpath utf-8 literals

but none has resolved my problem!

Note: with the solution of the first link, I obviously replaced the 'input_string' by let's say word = "baños", and I got an error like "the function has one argument, 2 given..."

Can anyone help?

score 1 · Accepted Answer · answered Nov 26 '16 at 02:55

1

Besides the literal Baños, your code snippet contains invalid literal string delimiter (both single and double quotes) which will cause a different error :

bathroom = response.xpath(“.//div[1][contains(., 'Baños’)]/text()").extract_first()
                          ^                            ^

Converting the entire XPath expression to unicode, as suggested in the 2nd link, and fixing the two quotes pointed above should fix the initial errors. Below as a quick test using lxml (which scrapy uses under the hood) :

>>> from lxml import etree
>>> root = etree.fromstring('<root/>')
>>> root.xpath(u".//div[1][contains(., 'Baños')]/text()")
[]

answered Nov 26 '16 at 02:55

har07

88,338
12
84
137

I've tried what you say, but I still get this error: `ValueError: XPath error: Invalid expression in .//div[1][contains(., 'Ba\xf1os')]/text()` – wj127 Nov 26 '16 at 03:07
Tested even in the actual scrapy shell with the following expression and no error (see what's different in your actual code, or try to copy paste from this code and run in your machine) : `r = response.xpath(u".//div[1][contains(., 'Baños')]/text()").extract_first()` – har07 Nov 26 '16 at 03:11
ok, I show you how is my full '*Xpath*' (I cut it a bit, cos was too long, but it's basically the same): `bathroom = response.xpath(u".//*[@id='details']/div/div/div/div/div[4]/div/div[3]/div[1][contains(., 'Baños')]/div[contains(., 'Baños')]/div[contains(., 'Baños')])]/div/span[3]/span/text()").extract_first()` – wj127 Nov 26 '16 at 03:14
This part contains too many `]`, hence Invalid XPath error : `/div[contains(., 'Baños')]/div[contains(., 'Baños')])]` – har07 Nov 26 '16 at 03:19
yeah, you're right...I should definitely go to bed after this error of mine...xD thanks a lot for your help! :):) – wj127 Nov 26 '16 at 03:31

how to add non-ascii characters in Xpath, in Scrappy

1 Answers1

Linked