I am scraping an XML document like this:
>>> response.xpath("//ul[@class='meta-info d-flex flex-wrap align-items-center list-unstyled justify-content-around']/li[position()=2]/text()").extract()
and is giving me the following output:
['\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t23 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ', '\n ', '\n\t\t\t ', '\n\t\t\t\t24 Feb, 2019 ']
But I do not want any fields that are either newlines, tabs or whitespaces, so I am trying to use the normalize-space()
function, as follows:
>>> response.xpath("normalize-space(//ul[@class='meta-info d-flex flex-wrap align-items-center list-unstyled justify-content-around']/li[position()=2]/text())").extract()
But I am getting a null output:
['']
What is happening here?