I have a html file (pulled via curl; to avoid pinging the website with my trials), which contains dog listing, and where I am interested in the h3 tag contents, which is the dog's name.
from urllib.request import urlopen
from bs4 import BeautifulSoup
# read from previously saved file
url = "petrescue_short.html"
page = open(url)
soup = BeautifulSoup(page.read(), "html.parser")
# print all h3 tags; find_all returns a list! (not array)
h3_headers = soup.find_all(['h3'])
print('List all h3 header tags :', *h3_headers, sep='\n\n')
This will provide the result of:
<h3>
dog1
</h3>
<h3>
dog2
</h3>
...
However, I want to get rid of the tags or at least of the newlines, and tried all sorts of things that ended up in an error message TypeError: 'NoneType' object is not callable
.
I also read this: How to modify list entries during for loop? but the list shown there is actually an array.
I sort of understand that list are not arrays, but isn't there a way to iterate through the list (which I can do) AND if I cannot change the list item, at least assign it to another variable and modify it?
I would have thought the following should work:
for i in range(len(h3_headers)):
h3_item = h3_headers[i]
h3_item = h3_item.replace('\n', '')
print(h3_item, sep='\n')
How can I achieve the following:
<h3>dog1</h3>
<h3>dog2</h3>
<h3>...</h3>