0

I am trying to scrape

URL="https://www.bankmega.com/en/about-us/bank-mega-network/"

to extract Bank name and address information. I am able to see the required information within the script tags. How can I extract it?

import requests

from bs4 import BeautifulSoup

import json

r = requests.get(URL)

soup = BeautifulSoup(r.content)

soup.find_all('script',type="text/javascript")
Searcher
  • 75
  • 7

1 Answers1

1

if you are able to select the relevant javascript, the easiest way is probably to search the script text for the first occurance of "[" and "]" since these two are the boundary of the dictionary. If you are able to put only the content (including the square brackets) into a seperate string, you can use the json-library to convert the string into a python object. The code below is a bit ugly when performing the string-cleaning, but it does the job.

import requests

from bs4 import BeautifulSoup

import json
import re

URL="https://www.bankmega.com/en/about-us/bank-mega-network/"

r = requests.get(URL)

soup = BeautifulSoup(r.content)

for element in soup.find_all('script',type="text/javascript"):
    if "$('#table_data_atm').hide();" in element.get_text():
        string_raw = element.get_text()
        first_bracket_open = string_raw.find("[")
        first_bracket_close = string_raw.find("]")
        cleaned_string = string_raw[first_bracket_open:first_bracket_close+1].replace('city:', '"city":').replace('lokasi:', '"lokasi":').replace('alamat:', '"alamat":').replace("\n", "")
        cleaned_string = re.sub("\s\s+", " ", cleaned_string)
        cleaned_string = cleaned_string.replace(", },", "},").replace(", ]", "]").replace("\t", " ")
        parsed = json.loads(cleaned_string)
        print(parsed)
C Hecht
  • 932
  • 5
  • 14
  • Hi.. Thanks for the answer. This works beautifully. Can you tell how to search the script text for the second occurrence of "[" and "]" since these two are the boundary of the second dictionary I want to extract? – Searcher Nov 10 '20 at 08:42
  • 1
    You can use the approach in this solution to find the nth "[" and "]" and then repeat the procedure I used: https://stackoverflow.com/questions/1883980/find-the-nth-occurrence-of-substring-in-a-string – C Hecht Nov 10 '20 at 08:49