How to extract specific data from a website's form

Question

I'm trying to get specific data from the website below. There is only one form which is "Token Address". But I don't know how to extract the numbers shown after "Buy Tax:" and "Sell Tax:". I just need the numbers without the percent symbol. What is the cleanest way to extract this information using Python?

My failed attempt:

xPath_buy = /html/body/div/div[1]/div/p[5]/text()[1]
xPath_sell = /html/body/div/div[1]/div/p[5]/text()[2]

token = "0x40619dc9f00ea34e51d96b6ec5d8a6ad75457434"
url = "https://honeypot.is/?address=" + token

def tax(token):
    url = "https://honeypot.is/?address=" + token
    HTML = requests.get (url)
    soup = BeautifulSoup(HTML.text, 'html.parser')
    text = soup.find('div style', attrs={'xpath': '//*[@id="shitcoin"]/div/p[5]/text()[1]'})
    return text

buy_tax = tax(token)
print(buy_tax)

calculated by js: see view-source:https://honeypot.is/?address=0x40619dc9f00ea34e51d96b6ec5d8a6ad75457434 (line #250) — balderman, Sep 12 '21 at 16:54

score 0 · Answer 1 · edited Sep 15 '21 at 18:36

0

Both the buy tax and sell tax are loaded dynamically into the page using JavaScript. You can tell by doing a print(soup) - you won't find the percentages there. The JavaScript code that is being used can be seen below.

let gasdiv = '<p>Gas used for Buying: ' + numberWithCommas(buyGasUsed) + '<br>Gas used for Selling: ' + numberWithCommas(sellGasUsed) + '</p>';
        document.getElementById('shitcoin').innerHTML = '<div style="max-width: 100%;" class="ui compact success message"><div class="header">Does not seem like a honeypot.</div><p>This can always change! Do your own due diligence.</p><p>Address: ' + addressToOutput + '</p><p id="token-info">'+tokenName + ' ('+tokenSymbol+')'+'</p>'+maxdiv+gasdiv+'<p>Buy Tax: ' + buy_tax + '%<br>Sell Tax: ' + sell_tax + '%</p></div>';

You'll need to use Selenium instead, and do something like this:

from selenium.webdriver.common.by import By

def tax(token):
    url = "https://honeypot.is/?address=" + token
    HTML = requests.get (url)
    text = driver.find_element(By.XPATH, xPath_buy)
    return text

buy_tax = tax(token)
print(buy_tax)

edited Sep 15 '21 at 18:36

Peter Mortensen

30,738
21
105
131

answered Sep 10 '21 at 18:35

isopach

1,783
7
31
43

Sure it does, but the issue is that I don't get numeric values when I run print(soup), rather I get 'buy_tax' and 'sell_tax'. Could you please write a complete code that returns only the numeric value for both taxes for a given token address? – BJonas88 Sep 12 '21 at 08:03
It is impossible to do it via BeautifulSoup because it is simply not there - loaded dynamically. You need selenium like I said. and do driver.find_elements(By.XPATH, xPath_buy) @BJonas88 – isopach Sep 12 '21 at 10:29
...nice! that's sort of what I'm looking for. However, what is the driver? It seems to be undefined. I tried By but it returns: ```Traceback (most recent call last): File "c:/Users/Owner/Documents/BlockchainPy/MyScripts/scratchTest.py", line 44, in buy_tax = tax(token0) File "c:/Users/Owner/Documents/BlockchainPy/MyScripts/scratchTest.py", line 41, in tax text = By.find_elements(By.XPATH, xPath_buy) AttributeError: type object 'By' has no attribute 'find_elements' ``` – BJonas88 Sep 13 '21 at 08:01
@BJonas88 My bad, it was supposed to be find_element without the 's'. Can you try again? – isopach Sep 13 '21 at 13:03
Hey so I have: ` driver = webdriver.Chrome(executable_path = "C:\\Users\\Owner\\Downloads\\chromedriver_win32\\chromedriver.exe") def tax(token): url = "https://honeypot.is/?address=" + token HTML = requests.get (url) text = driver.find_element(By.XPATH, xPath_buy) return text buy_tax = tax(token0) print(buy_tax) ` and I get : `[25572:2544:0914/000512.836:ERROR:device_event_log_impl.cc(214)] [00:05:12.836] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F) ` – BJonas88 Sep 14 '21 at 05:38
1

See https://stackoverflow.com/questions/64927909/failed-to-read-descriptor-from-node-connection-a-device-attached-to-the-system @BJonas88 – isopach Sep 14 '21 at 07:30
Looks like is solved isn't it? As for , that would be a different question, consider posting it separately after you've accepted this answer. @BJonas88 – isopach Sep 15 '21 at 02:16
Yea bro I'd think it's partially solved as I can't confirm it works without getting to the values for buy/sell tax. I'm using the xpath ``` //*[@id="shitcoin"]/div/p[6]/text()[1] ``` for the buy_tax value can but it throws an error. Can you pls confirm it's correct? – BJonas88 Sep 15 '21 at 19:34

How to extract specific data from a website's form

1 Answers1