0

I am struggling with a web page on a secure website. I will put a snapshot of what I am working on. The XPATH that has the rows of a table (equals to 13 rows) is that

//div[@id='Section3']

But the data is not inside that XPATH but after it in 9 columns.

How can I refer to those children or ancestors ( I don't know the exact term)?

Here's the HTML for that page ( I couldn't include it in the question)

https://pastebin.com/hEq8K75C

Here's the snapshot (may clarify the issue well) enter image description here

How to implement the variable j in such lines?

Dim x As Long, i As Long, j As Long
x = .FindElementsByXPath("//div[@id='Section3']")
For i = 1 To x
    For j = 1 To 9
       Cells(i, j).Value = .FindElementByXPath("//div[@id='Section3']/following-sibling::div[following-sibling::div[@id='Section3'][count(preceding-sibling::div[@id='Section3'])=" & i & " and count(following-sibling::div[@id='Section3'])=" & x - (i + 1) & "]][" & j & "]").Text
    Next j
Next i
YasserKhalil
  • 9,138
  • 7
  • 36
  • 95

1 Answers1

0

Try to use below code:

counter = len(driver.find_elements_by_id("Section3"))

xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    for cell in cells:
         value = cell.find_element_by_xpath(".//td").text
         print(value)
JaSON
  • 4,843
  • 2
  • 8
  • 15
  • Thanks a lot. I got 119 while I expect 13 rows * 9 columns, so the result would be 117 not 119. Can you help me extracting each record in a row? – YasserKhalil Nov 18 '20 at 12:38
  • @YasserKhalil I guess `//div[@id='Section3']/following-sibling::div[not(@id) and following-sibling::div[@id='Section3']]` will skip 2 undesired results. To extract rows separately we probably need to use loop, but I'm not good in VBA – JaSON Nov 18 '20 at 12:41
  • Thanks a lot. As for this xpath I got 109.. Not 117. It is weird. Can I get the xpath for each record separately so as to be able to extract the data from each record? – YasserKhalil Nov 18 '20 at 12:54
  • @YasserKhalil Try `//div[@id='Section3']/following-sibling::div[following-sibling::div[@id='Section3'][count(preceding-sibling::div[@id='Section3'])={x} and count(following-sibling::div[@id='Section3'])=13-{x+1}]]`. Just replace `{x}` with `1`, `2`, `3`... to get each row – JaSON Nov 18 '20 at 13:27
  • Thank you very much. I tried using the XPATH as you described and replaced it like that `//div[@id='Section3']/following-sibling::div[following-sibling::div[@id='Section3'][count(preceding-sibling::div[@id='Section3'])=1 and count(following-sibling::div[@id='Section3'])=12]]` but this doesn't select the record of data. I am sure I have interpreted your reply in wrong way. – YasserKhalil Nov 18 '20 at 15:02
  • This what I could play with till now `//div[@id='Section3']/following-sibling::div[following-sibling::div[@id='Section3'][count(preceding-sibling::div[@id='Section3'])=1]]` – YasserKhalil Nov 18 '20 at 15:04
  • @YasserKhalil , I mean like `//div[@id='Section3']/following-sibling::div[following-sibling::div[@id='Section3'][count(preceding-sibling::div[@id='Section3'])=1 and count(following-sibling::div[@id='Section3'])=11]]` (note that you used `...count(following-sibling::div[@id='Section3'])=12` while it should be `count(following-sibling::div[@id='Section3'])=11`). The point is to increase count for preceding siblings by 1 (1, 2, 3, 4...) while decreasing count for following by 1 (11, 10, 9, 8...) – JaSON Nov 18 '20 at 15:08
  • Thank you very much for your great support. – YasserKhalil Nov 18 '20 at 15:12
  • I have used the XPATH in a code but didn't know how to loop through the columns .. Please have a look at the updated post. – YasserKhalil Nov 18 '20 at 15:18
  • 1
    @YasserKhalil Honestly , I can't help you much with VBA code - I'm quite good in Python (in JS a little), but not in VBA... The outer loop seem to be OK (you might need to change `For i = 1 To x` to `For i = 1 To x - 1`), but for inner loop I guess you should get list of divs and then loop through each. Not using `For j = 1 To 9` and `[" & j & "]`. – JaSON Nov 18 '20 at 15:35
  • I am curious to see a python code for such a web page. Can you help me with that? – YasserKhalil Nov 18 '20 at 16:19
  • @ JaSON Can you show me python code for that? – YasserKhalil Dec 08 '20 at 06:56
  • 1
    @YasserKhalil can you re-share HTML code (pastebin removed it) and clarify what is your exact desire output? – JaSON Dec 08 '20 at 08:44
  • I have updated the main post. The desired output is the table in the html (the problem is that it is not direct table .. nested and lot of span tags and tables) – YasserKhalil Dec 08 '20 at 08:54
  • @YasserKhalil check updated answer. There is an exception in the end while all the values seem to be printed, so you can ignore the exception – JaSON Dec 08 '20 at 10:04
  • Thank you very much. Can you provide me with the whole code as I am beginner at python stuff? .. I am using requests and BeautifulSoup `soup = BeautifulSoup(res_report.content, 'lxml')`. How can I use xpath with BeautifulSoup as I didn't find a clue? – YasserKhalil Dec 08 '20 at 10:10
  • 1
    @YasserKhalil AFAIK BeautifulSoup doesn't support XPath. You need to try [`lxml.html`](https://lxml.de/lxmlhtml.html). It support XPath and IMO it's more flexible and convenient – JaSON Dec 08 '20 at 10:15
  • I used `from selenium import webdriver` then put the html in a variable then used `driver = webdriver.Chrome("C:/chromedriver.exe")` then used this line `driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))` to load the html content and at last used the code you offered. I didn't get errors but didn't get any output. Any idea what may be wrong?. I used `print(counter)` and got 0 – YasserKhalil Dec 08 '20 at 10:36
  • @YasserKhalil I put HTML source from pastebin into local file and did `driver.get("file:///" +'/path/to/local.html' )` - everything works fine – JaSON Dec 08 '20 at 11:01
  • Thank you very very much. Now it worked but after Row #18 there are errors. How can I convert the data to DataFrame?.. I have posted a new question as this is overwhelming. Thanks a lot for your great support. Here's the link https://stackoverflow.com/questions/65197974/ – YasserKhalil Dec 08 '20 at 11:20
  • 1
    @YasserKhalil yes, I saw that error. I guess there something wrong with logic, but I don't think it affects your output. You can simply use `try`/`except` to get rid of it. Unfortunately, I don't have much experience in creating data frames, so I won't help you in that new issue. Also I think it would be hard for Community users to help you as you didn't provide output data in question, so users don't know what exactly to convert – JaSON Dec 08 '20 at 11:31
  • 1
    The output is direct. Open the local.html file and you will see only one table with the data needed. Generally thank you very much. I will try to think of a way. Best and Kind Regards. – YasserKhalil Dec 08 '20 at 11:44