How to extract address from html file

Question

I am new to the community. I am working on a project for determining the address from an html file. The specific string that I am trying to process is

<address class="list-card-addr">1867 Central Ave, Augusta, GA 30904</address>

I have tried processing it using manual tools. I'd like to use python to process the entire html file. Can someone explain how to do this in python? Thank you in advance.

Use a library called `BeautifulSoup` to parse and collect data. — Pedro Maia, Dec 06 '21 at 01:05

CodeMonkey · Answer 1 · 2021-12-08T16:36:15.260

You can extract the address using BeautifulSoup, which is very handy for accessing elements in HTML and XML documents.

from bs4 import BeautifulSoup
import requests

r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, "html.parser")
addr = soup.find("address", class_="list-card-addr")
print(addr.text)

If there are multiple addresses in target HTML then use find_all() function and a loop to access all address elements.

for addr in soup.find_all("address", class_="list-card-addr"):
    print(addr.text)

score 0 · Answer 2 · answered Dec 06 '21 at 01:13

0

Use Regex to find the addresses....

r1 = re.findall(r"<address class=\"?list-card-addr\"?>([^<]+)", html)
print(r1)

answered Dec 06 '21 at 01:13

Haven't we all learned at this point not to use RegExp in HTML? https://stackoverflow.com/a/1732454/1051677 – Silviu Burcea Aug 22 '23 at 08:47

How to extract address from html file

2 Answers2