2

When I enter something in 'cn', script will make query on website and give me table with multiple rows

from lxml import html
from lxml import etree
from lxml.etree import XPath
import requests

cn = input ('CN: ')

find_page = requests.get('search query' + cn + '')
tree = html.fromstring(find_page.content)

# //tr[2]/td[2]/a/text() is first row after <th>
com = tree.xpath('//tr[2]/td[2]/a/text()')

print ('COM:', com)

This code print me only first row from table on XPath location //tr[2] but I need to print all other table rows //tr[3]/td[2]/a/text() //tr[4]/td[2]/a/text() //tr[...]/td[2]/a/text()

EDIT:

After solving to get all items from table, I have result for example COM: ['DAP', 'DAPA', 'DAP FOOD'] all of this have href. I can access and scrape only on first link (DAP) but can't scrape from (DAPA and DAP FOOD)

from lxml import html
from lxml import etree
from lxml.etree import XPath
import requests

cn = input ('CN: ')

find_page = requests.get('search query' + cn + '')
tree = html.fromstring(find_page.content)
    
# //tr[2]/td[2]/a/text() is first row after <th>
com = tree.xpath('//tr/td[2]/a/text()')

link = tree.xpath('//tr/td[2]/a/@href')[0]
link = str(link)

com_link = ('website' + link)
page = requests.get(com_link)
tree = html.fromstring(page.content)

postal_code = tree.xpath('//span[@itemprop="postalCode"]/text()')[0]

print ('COM:', com)
print ('Postal Code', postal_code)

How can I access on DAP, DAPA, DAP FOOD and get postal_code from each?

  • 1
    use `all_rows = tree.xpath("//tr")` and `for`-loop `for row in all_rows[1:]: row.xpath(".//td")` – furas Dec 29 '20 at 03:56
  • 1
    or try without `[2]` but with slicing `[1:]` - `tree.xpath('//tr/td[2]/a/text()')[1:]` – furas Dec 29 '20 at 04:00
  • [What is the XPath to select a range of nodes?](https://stackoverflow.com/questions/3354987/what-is-the-xpath-to-select-a-range-of-nodes) – furas Dec 29 '20 at 04:01
  • @furas ```tree.xpath('//tr/td[2]/a/text()')[0:]``` working and give me all from table what I need, thank you. – Henry Wyrick Dec 29 '20 at 22:05
  • why do you use `[0]` ? – furas Dec 30 '20 at 01:26
  • @furas When not use ```[0]``` link is for example ```Link ['/folder/DAP']``` with and when I use with website it is like website['/folder/DAP'] So when use ```[0]``` it is website/folder/DAP – Henry Wyrick Dec 30 '20 at 01:43
  • it only shows that your xpath `'//tr/td[2]/a/@href'` is ineffective because it should gives all links at once (similar to `'//tr/td[2]/a/text()'`). And it can means you may have to do it in different way - like in my first comment - first find all rows and later search text, link, postcode in every row separatelly. – furas Dec 30 '20 at 01:53

1 Answers1

0

Changed com = tree.xpath('//tr[2]/td[2]/a/text()') to com = tree.xpath('//tr/td[2]/a/text()') and it working

from lxml import html
from lxml import etree
from lxml.etree import XPath
import requests

cn = input ('CN: ')

find_page = requests.get('search query' + cn + '')
tree = html.fromstring(find_page.content)

# //tr[2]/td[2]/a/text() is first row after <th>
com = tree.xpath('//tr/td[2]/a/text()')

print ('COM:', com)