0

I have been attempting to use BS to find the value of an input field on a webpage. However, no matter what I try, it always returns as 'None' although the element certainly exists. Here is a sample of the HTML.

<td class="formArea"><table border="0" cellspacing="2" cellpadding="2">
<tr>
<td class="main">Street Address:</td>
<td class="main">
<input type="text" name="entry_street_address" id="entry_street_address" value="1234 Example Ln" maxlength="64" required>&nbsp;<span class="fieldRequired">* Required</span></td>

So I attempt to use BS4 to grab the value of 'entry_street_address':

r = session.get("sampleurl.com/wheremydataisstored")

time.sleep(3)
soup = bs4(r.content,'html5lib')
info = soup.find('input', {'id': 'entry_street_address'}).get('value')
print(info)

Unfortunately, this always returns:

AttributeError: 'NoneType' object has no attribute 'get'

This always happens. No matter if I do html5lib, lxml, html.parser, no matter how long I .sleep() to wait for the page to load, etc. I'm not really sure where it's going wrong!

  • 1
    *"However, no matter what I try, it always returns as 'None' although the element certainly exists."* How do you know that the element exists? – Stef Jan 04 '22 at 23:02
  • These two similar question had answer "the element indeed doesn't exist": https://stackoverflow.com/questions/40146128/beautifulsoup-returns-none-even-though-the-element-exists and https://stackoverflow.com/questions/51529482/beautiful-soup-find-returns-none – Stef Jan 04 '22 at 23:03
  • @Stef I've already looked at both of those. The first answer seemed to work because of the time.sleep() which I've added in hopes it would help-it didn't. I've checked the page source, it's there, and I can't imagine why I'm having trouble extracting the value here. – Jeremy Gorden Jan 04 '22 at 23:06
  • 1
    Did you notice the answer uses selenium in addition to beautifulsoup? – Stef Jan 04 '22 at 23:07
  • 1
    What these comments are hinting at is the very likely possibility that the DOM of the page whose HTML you're trying to parse, when viewed in a browser, is being modified and populated asynchronously using JavaScript, which is a very common practice. When you make your `request.get`, the response will only contain a barebones HTML template, which does not actually contain the element you're looking for (since this template would normally be viewed in a browser and populated by JavaScript.) Why don't you try printing `r.text` to see if the element is really there? – Paul M. Jan 04 '22 at 23:35
  • Please share the page you're scraping. As others have said, if you're looking in the browser's live element inspector, that HTML might have been injected by JS after the page loaded, and isn't in the requested static markup. – ggorlen Jan 05 '22 at 02:50

0 Answers0