Python lxml print each table row

Question

When I enter something in 'cn', script will make query on website and give me table with multiple rows

from lxml import html
from lxml import etree
from lxml.etree import XPath
import requests

cn = input ('CN: ')

find_page = requests.get('search query' + cn + '')
tree = html.fromstring(find_page.content)

# //tr[2]/td[2]/a/text() is first row after <th>
com = tree.xpath('//tr[2]/td[2]/a/text()')

print ('COM:', com)

This code print me only first row from table on XPath location //tr[2] but I need to print all other table rows //tr[3]/td[2]/a/text() //tr[4]/td[2]/a/text() //tr[...]/td[2]/a/text()

EDIT:

After solving to get all items from table, I have result for example COM: ['DAP', 'DAPA', 'DAP FOOD'] all of this have href. I can access and scrape only on first link (DAP) but can't scrape from (DAPA and DAP FOOD)

from lxml import html
from lxml import etree
from lxml.etree import XPath
import requests

cn = input ('CN: ')

find_page = requests.get('search query' + cn + '')
tree = html.fromstring(find_page.content)
    
# //tr[2]/td[2]/a/text() is first row after <th>
com = tree.xpath('//tr/td[2]/a/text()')

link = tree.xpath('//tr/td[2]/a/@href')[0]
link = str(link)

com_link = ('website' + link)
page = requests.get(com_link)
tree = html.fromstring(page.content)

postal_code = tree.xpath('//span[@itemprop="postalCode"]/text()')[0]

print ('COM:', com)
print ('Postal Code', postal_code)

How can I access on DAP, DAPA, DAP FOOD and get postal_code from each?

use `all_rows = tree.xpath("//tr")` and `for`-loop `for row in all_rows[1:]: row.xpath(".//td")` — furas, Dec 29 '20 at 03:56
or try without `[2]` but with slicing `[1:]` - `tree.xpath('//tr/td[2]/a/text()')[1:]` — furas, Dec 29 '20 at 04:00
[What is the XPath to select a range of nodes?](https://stackoverflow.com/questions/3354987/what-is-the-xpath-to-select-a-range-of-nodes) — furas, Dec 29 '20 at 04:01
@furas ```tree.xpath('//tr/td[2]/a/text()')[0:]``` working and give me all from table what I need, thank you. — Henry Wyrick, Dec 29 '20 at 22:05
@furas When not use ```[0]``` link is for example ```Link ['/folder/DAP']``` with and when I use with website it is like website['/folder/DAP'] So when use ```[0]``` it is website/folder/DAP — Henry Wyrick, Dec 30 '20 at 01:43
it only shows that your xpath `'//tr/td[2]/a/@href'` is ineffective because it should gives all links at once (similar to `'//tr/td[2]/a/text()'`). And it can means you may have to do it in different way - like in my first comment - first find all rows and later search text, link, postcode in every row separatelly. — furas, Dec 30 '20 at 01:53

Henry Wyrick · Answer 1 · 2020-12-30T00:56:22.350

0

Changed com = tree.xpath('//tr[2]/td[2]/a/text()') to com = tree.xpath('//tr/td[2]/a/text()') and it working

from lxml import html
from lxml import etree
from lxml.etree import XPath
import requests

cn = input ('CN: ')

find_page = requests.get('search query' + cn + '')
tree = html.fromstring(find_page.content)

# //tr[2]/td[2]/a/text() is first row after <th>
com = tree.xpath('//tr/td[2]/a/text()')

print ('COM:', com)

edited Dec 30 '20 at 00:56

answered Dec 29 '20 at 22:07

Henry Wyrick

53
4

1

you should get the same result without `[0:]` – furas Dec 29 '20 at 23:07
Yes, you are right. It working without [0:] – Henry Wyrick Dec 30 '20 at 00:55
@furas I have edited on first question, if you can check and know how to solve it. – Henry Wyrick Dec 30 '20 at 01:20
why do you use `[0]` in links ? Without `[0]` you should get all values. – furas Dec 30 '20 at 01:26

Python lxml print each table row

1 Answers1