How to scrape address (comma separated text) using Beautifulsoup in python

Question

I am trying to scrape address from the below link:

https://www.yelp.com/biz/rollin-phatties-houston

But I am getting only the first value of the address (i.e.: 1731 Westheimer Rd) out of complete address which is separated by a comma:

1731 Westheimer Rd, Houston, TX 77098

Can anyone help me out in this, please find my code below:

import bs4 as bs
import urllib.request as url

source = url.urlopen('https://www.yelp.com/biz/rollin-phatties-houston')
soup = bs.BeautifulSoup(source, 'html.parser')

mains = soup.find_all("div", {"class": "secondaryAttributes__09f24__3db5x arrange-unit__09f24__1gZC1 border-color--default__09f24__R1nRO"})
main = mains[0] #First item of mains

address = []
for main in mains:
    try:       
        address.append(main.address.find("p").text)
    except:
        address.append("")

print(address)
# 1731 Westheimer Rd

See how to create a [mcve]. The URL doesn't matter, just the content of the HTML. Make the example as small as possible, and still exhibit the problem you can't solve. — Peter Wood, Dec 19 '20 at 01:09

score 2 · Accepted Answer · answered Dec 19 '20 at 03:38

2

import requests
import re
from ast import literal_eval


def main(url):
    r = requests.get(url)
    match = literal_eval(
        re.search(r'addressLines.+?(\[.+?])', r.text).group(1))
    print(*match)


main('https://www.yelp.com/biz/rollin-phatties-houston')

Output:

1731 Westheimer Rd Houston, TX 77098

answered Dec 19 '20 at 03:38

αԋɱҽԃ αмєяιcαη

11,825
3
17
50

Awesome - this is just great! Many thanks from Spain ( Andalusia ) – zero Feb 17 '21 at 13:08

score 1 · Answer 2 · answered Dec 19 '20 at 02:54

There is no need to find the address information by inspecting the element, actually, the data inside a javascript tag element is passed onto the page already. You can get it by the following code

import chompjs
import bs4 as bs
import urllib.request as url

source = url.urlopen('https://www.yelp.com/biz/rollin-phatties-houston')
soup = bs.BeautifulSoup(source, 'html.parser')

javascript = soup.select("script")[16].string
data = chompjs.parse_js_object(javascript)
data['bizDetailsPageProps']['bizContactInfoProps']['businessAddress']

here is another example to show how to parse Javascript objects into a dict. https://stackoverflow.com/a/65272779/10153574 — Jerry An, Dec 19 '20 at 02:56

score 1 · Answer 3 · answered Dec 19 '20 at 04:49

The business address that is shown on the webpage is generated dynamically. If you view Page Source of the URL, you will find that the address of the restaurant is stored in a script element. So you need to extract the address from it.

from bs4 import BeautifulSoup
import requests
import json
page = requests.get('https://www.yelp.com/biz/rollin-phatties-houston')
htmlpage = BeautifulSoup(page.text, 'html.parser')
scriptelements = htmlpage.find_all('script', attrs={'type':'application/json'})
scriptcontent = scriptelements[2].text
scriptcontent = scriptcontent.replace('<!--', '')
scriptcontent = scriptcontent.replace('-->', '')
jsondata = json.loads(scriptcontent)
print(jsondata['bizDetailsPageProps']['bizContactInfoProps']['businessAddress'])

Using the above code, you will be able to extract the address of any business.

How to scrape address (comma separated text) using Beautifulsoup in python

3 Answers3