1

I'm trying to extract values from the table in this site: https://www.geonames.org/search.html?q=&country=IT

In my example I want to extract the name 'Rome' and I used this code:

import requests
import lxml.html

html = requests.get('https://www.geonames.org/search.html?q=&country=IT')
doc = lxml.html.fromstring(html.content)

table_body = doc.xpath('//*[@id="search"]/table')[0]

cities = table_body.xpath('//*[@id="search"]/table/tbody/tr[3]/td[2]/a[1]/text()')

Everything seams ok for me but wehen I print it the result is:

>>> print(cities)
[]

I really have no idea of what could be the problem, do someone have some suggestion?

gergiu
  • 11
  • 1
  • 1
    What do you want to scrape, exactly? – sentence May 05 '19 at 21:56
  • Possible duplicate of [Why does this xpath fail using lxml in python?](https://stackoverflow.com/questions/23900348/why-does-this-xpath-fail-using-lxml-in-python) – ggorlen May 05 '19 at 22:22

2 Answers2

0

If you're looking to get "Rome", you can omit tbody. This element was inserted by the browser and isn't present in the original document returned by the request.

Additionally, the extra line table_body = doc.xpath('//*[@id="search"]/table')[0] is redundant--you can search directly from the root.

import requests
import lxml.html

html = requests.get('https://www.geonames.org/search.html?q=&country=IT')
doc = lxml.html.fromstring(html.content)
print(doc.xpath('//*[@id="search"]/table/tr[3]/td[2]/a[1]/text()')[0]) # => Rome
ggorlen
  • 44,755
  • 7
  • 76
  • 106
0

Here is the simple script to extract all cities in that page

import requests
import lxml.html

html = requests.get('https://www.geonames.org/search.html?q=&country=IT')
doc = lxml.html.fromstring(html.content)
# corrected the xpath in the below line.
cities = doc.xpath("//table[@class='restable']//td[a][2]/a[1]/text()")
for city in cities:
    print(city)
supputuri
  • 13,644
  • 2
  • 21
  • 39
  • Thank you a lot, it is really helpful :) – gergiu May 07 '19 at 18:16
  • If you feel the issue is resolved, please accept the answer by clicking on the check mark below the down vote button on the left hand side of my answer. Feel free to up vote. – supputuri May 07 '19 at 21:03