-1

I am trying to scrape some data from this page: https://www.blockchain.com/btc/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f

I am absolutely perplexed...

The absolute path for the "Number Of Transactions" is /html[1]/body[1]/div[1]/div[2]/div[1]/table[1]/tbody[1]/tr[2]/td[1]

When I run the code:

print driver.find_element(By.XPATH, "/html/body/div/div[2]/div[1]/table/tbody/tr[2]/td[1]").text

The driver returns "No Inputs (Newly Generated Coins)"

which has the path /html[1]/body[1]/div[1]/div[3]/div[1]/table[1]/tbody[1]/tr[2]/td[1]/b[1]

I find it difficult to understand why absolute path is selecting a different value.

So when I run the code:

print driver.find_element(By.XPATH, "/html[1]/body[1]/div[1]/div[3]/div[1]/table[1]/tbody[1]/tr[2]/td[1]/b[1]").text

It returns that the element doesn't exist(?)

charlie090
  • 318
  • 3
  • 17
  • I would ask you to go through what is the difference between absolute xpath and relative xpath. [Xpath tutorial](https://www.seleniumeasy.com/selenium-tutorials/xpath-tutorial-for-selenium) – Madhan Oct 27 '19 at 00:01
  • I used / not // – charlie090 Oct 27 '19 at 01:14

2 Answers2

1

It is indeed strange; both Firefox and Chrome show the same xpath for that element, but if you get the page using requests, or look at its source, there is no <tbody> element in there. The correct xpath expression to get the number of transactions (i.e., 1) is

   /html/body/div/div/div/table[1]/tr[2]/td[2]/text()

As an explanation why it works, try this:

url = """
https://www.blockchain.com/btc/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
"""
import lxml
import requests
resp = requests.get(url)

tree = lxml.html.fromstring(resp.text)
print(tree.xpath("/html/body/div/div/div/table[1]/tr[2]/td[2]/text()")

Output:

['1', '\n ']

And, since @Guy is right and you should avoid absolute paths (and your situation is the perfect example why), you can get the same output by using

print(tree.xpath("//table/tr[2]/td[2]/text()")
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • I believe using source is not correct. Maybe I forgot to mention I am using Selenium + Python. The webdriver is using the elements not just delivered from the server but also after various manipulations. Anyways I used your path which still yielded in no elements found. Where you able to find the element? – charlie090 Oct 27 '19 at 03:12
  • @CharlesLee - See edited answer as to how to get to the element using requests and lxml. – Jack Fleeting Oct 27 '19 at 10:32
1

.text will return all the text under the WebElement, including descendants text. The first xpath returns the element <td class="txtd hidden-phone mobile-f12 stack-mobile">, "No Inputs (Newly Generated Coins)" text is part of it.

The second xpath doesn't work because it's incorrect, div[3] need to be div[2], there are only 2 <div>s in this location.

*Using absolute xpath is bad practice, avoid it.

Guy
  • 46,488
  • 10
  • 44
  • 88
  • I was told the opposite about not using absolute xpath. I was told that relative xpath is computationally taxing in terms of resources. – charlie090 Oct 27 '19 at 09:52
  • @CharlesLee they where wrong. Both have the same efficiency (which is bad compare to other selectors, by the way), but absolute xpath is very fragile, the smallest change in the html will invalidate it. – Guy Oct 27 '19 at 09:57
  • The absolute xpath /html/body/div/div[2]/div[1]/table/tbody/tr[2]/td[1] in terms of DOM structure should be the second tr and first td of the table with class="table table-striped". I dont understand why you would say it is ...As far as I can see it is not a descendant either... – charlie090 Oct 27 '19 at 09:58
  • @CharlesLee This element is the element you are talking about. The `` with the text is descendant (direct child actually) of this element. – Guy Oct 27 '19 at 10:02
  • of this xpath? /html/body/div/div[2]/div[1]/table/tbody/tr[2]/td[1] – charlie090 Oct 27 '19 at 10:03
  • @CharlesLee Yes. – Guy Oct 27 '19 at 10:05
  • wait what...can you check this out? https://imgur.com/a/qNp3P6b tell me what you think...or can you explain how you got to your conclusion? I also used browser xpath plugins.... – charlie090 Oct 27 '19 at 10:10
  • @CharlesLee this is a different element, you need to replace `div[2]` with `div[1]` to get it. I suggest you start using the console drawer to locate elements in the developer tools. – Guy Oct 27 '19 at 10:14
  • okay....changing div 2 with div 1 does result in the expected answer...but i dont understand why its div 1....looking at the console: https://imgur.com/a/ziANs5L – charlie090 Oct 27 '19 at 10:25
  • @CharlesLee I don't know where you see this html structure but this isn't what I see when I check the site. Maybe there are some cookies/cache changing it, open it in incognito. – Guy Oct 27 '19 at 10:30
  • Just an unrelated question. I see your point with absolute xpaths being inflexible. But what is the most efficient selector? – charlie090 Oct 27 '19 at 14:13
  • @CharlesLee There are several threads about this, like [this](https://stackoverflow.com/questions/41001439/what-is-the-most-efficient-selector-to-use-with-findelement) and [this](https://stackoverflow.com/questions/38716233/in-selenium-webdriver-which-is-better-in-terms-of-performance-linktext-or-css) – Guy Oct 28 '19 at 05:13