Extracting table contents from a request response with xpath

Question

I'm wrapping up a library in requests using something along the lines of the following:

import requests
from lxml.html import fromstring

URL = "https://test"
COOKIES = {"test": "AAAAAAAAAAAAA"}
HEADERS = {"Connection": "close", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "en-US,en;q=0.9"}

response = requests.get(URL, headers=HEADERS, cookies=COOKIES)
source = fromstring(response.content)

table = source.xpath("")

The response contains a lot of content and I'm trying to isolate the items in a table. The relevant part of the response is:

<table border="0" cellpadding="0" cellspacing="0" width="100%" class="dialogHdrTbl" summary="Layout table"><thead><tr align="left"><th class="groupHdr"><div class="groupHdr">View Client List</div></th></tr></thead><tbody><tr><td height="1"></td></tr></tbody></table><table width="100%" cellpadding="0" cellspacing="0" border="0" summary="Data table" class="dialogTbl"><tbody><tr class="altRwFlse"><td height="25" headers="hdr1" class="c1">TEST CLIENT 0</td><td height="25" headers="hdr2"><a class="dialogLnk" href="javascript:opener.document.contactForm.company.value=&quot;TEST CLIENT 1&quot;;self.close();" target="">Select</a></td></tr><tr class="altRwTre"><td height="25" headers="hdr1" class="c1">TEST CLIENT 2</td>

I'm trying to output:

TEST CLIENT 0 TEST CLIENT 1 TEST CLIENT 2

I've looked at using XPATH for this (based on this posting: How to parse text from a html table element) however I don't quite understand how to form my xpath query. What am I missing here?

score 1 · Accepted Answer · answered Mar 28 '18 at 16:45

1

You can try below code to get required output:

[i.split('value="')[-1].replace('";self.close();', '') for i in source.xpath('//table[@summary="Data table"]//td[not(a)]/text() | //table[@summary="Data table"]//td/a/@href')]

Output should be

['TEST CLIENT 0', 'TEST CLIENT 1', 'TEST CLIENT 2']

answered Mar 28 '18 at 16:45

Andersson

51,635
17
77
129

It seems you were not active quite some time, sir Andersson. However, in your spare time please take a look the content of [This Post](https://stackoverflow.com/questions/49581078/cant-trigger-a-click-on-a-certain-link-using-selenium/49586305?noredirect=1#comment86186968_49586305). – SIM Apr 06 '18 at 06:08
@Topto , yep, I was on vacation. I checked your new question. [Check this answer](https://stackoverflow.com/questions/49581078/cant-trigger-a-click-on-a-certain-link-using-selenium/49688791#49688791) – Andersson Apr 06 '18 at 08:45

Extracting table contents from a request response with xpath

1 Answers1