1

I am learning Python and hope to get some information from a map navigation container of this website https://findmasa.com/view/map#b1cc410b, such as mural id, latitude, longitude, artist name, address, city, and state.

Below is the code I tried before, but the output is always NO DATA. My coding skill is limited so any help would be sincerely appreciated!

from bs4 import BeautifulSoup
import requests

url = 'https://findmasa.com/view/map#b1cc410b'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

li_element = soup.find('li', id='b1cc410b')

if li_element:
    data_lat = li_element['data-lat']
    data_lng = li_element['data-lng']
    artist_name = li_element.find('a').text
    address = li_element.find_all('p')[1].text
    city = li_element.find_all('p')[2].text

    print('LATITUDE ', data_lat)
    print('LONGITUDE ', data_lng)
    print('ARTIST ', artist_name)
    print('ADDRESS ', address)
    print('CITY ', city)
else:
    print('NO DATA')
Jessie H
  • 37
  • 7
  • Consider selenium: https://www.geeksforgeeks.org/selenium-python-tutorial/ – TheTridentGuy supports Ukraine Jul 16 '23 at 19:57
  • StackOverflow is a site for debugging any errors or issues, so it'd be better if you put the problematic code, current output, and expected output, so we can help. Without any details, the question will get closed soon. – Zero Jul 16 '23 at 23:13
  • Web sites today are often dynamic. The initial page code does not contain the map info. Instead, it contains code to load that info and update the page. – Ouroborus Jul 17 '23 at 01:50
  • @Ouroborus Should I consider this website as a dynamic website? My understanding of a dynamic website is that the information changes automatically whenever you open it. How should I get the information hidden behind? – Jessie H Jul 17 '23 at 01:53
  • Yes. You either use a browser as a proxy (Selenium, etc.) or you find and access its API directly. – Ouroborus Jul 17 '23 at 02:03

1 Answers1

0

The information you're looking for gets loaded slowly and involves Javascript. As the requests library doesn't support the javascript, it doesn't return the content/information and thus your if-statement gets False. So, it goes to the else-statement and you get NO DATA.

You may try using Selenium

Here's the solution

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC

# Create a Chrome driver instance
driver = Chrome()

url = 'https://findmasa.com/view/map#b1cc410b'
driver.get(url)

# Wait for the li element with id 'b1cc410b' to be present on the page
li_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li#b1cc410b')))

data_lat = li_element.get_attribute('data-lat')
data_lng = li_element.get_attribute('data-lng')
artist_name = li_element.find_element(By.TAG_NAME, 'a').text
address = li_element.find_elements(By.TAG_NAME, 'p')[1].text
city = li_element.find_elements(By.TAG_NAME, 'p')[2].text

# Print the extracted data
print(data_lat)
print(data_lng)
print(artist_name)
print(address)
print(city)

output:

34.102025
-118.32694167
Tristan Eaton
6301 Hollywood Boulevard
Los Angeles, California

You can install selenium using pip:

pip install selenium

[UPDATE]:

  • If the id number is known beforehand, you can easily put it in a variable in order to make the code dynamic using the f-string
  • And to overcome the InvalidSelectorException that you're getting for some url or better to say for some id number, use the notation li[id="id_value"] instead of li#id_value.
    from selenium.webdriver import Chrome
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.wait import WebDriverWait
    import selenium.webdriver.support.expected_conditions as EC
    
    # Create a Chrome driver instance
    driver = Chrome()
    variable_name = '1456a64a' # fdd8a7d5, b1cc410b
    url = f'https://findmasa.com/view/map#{variable_name}'
    driver.get(url)
    
    # Wait for the li element with id 'b1cc410b' to be present on the page
    li_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'li[id="{variable_name}"]')))
    
    data_lat = li_element.get_attribute('data-lat')
    data_lng = li_element.get_attribute('data-lng')
    artist_name = li_element.find_element(By.TAG_NAME, 'a').text
    address = li_element.find_elements(By.TAG_NAME, 'p')[1].text
    city = li_element.find_elements(By.TAG_NAME, 'p')[2].text
    
    # Print the extracted data
    print(data_lat)
    print(data_lng)
    print(artist_name)
    print(address)
    print(city)
    
    output:
    34.0536722
    -118.3041877
    unknown
    960 South Harvard Boulevard
    Los Angeles, California
    
Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24
  • Thank you very much for your clarification. The first time I run this updated code, I had the same output as yours. But the second time I got an error message . Do you know what the problem is? – Jessie H Jul 19 '23 at 02:22
  • the only source of this TimeoutException is this line `li_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li#b1cc410b'))) `, perhaps you could try increasing the wait time `li_element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li#b1cc410b'))) ` and see if that works – Ajeet Verma Jul 19 '23 at 02:25
  • I changed the wait time from 10 to 30 and it worked again... Am I doing it correctly? Because I have 100+ this kind of links and I need to get the same information from each of them. – Jessie H Jul 19 '23 at 02:26
  • 1
    yes, I think it will work, you could make the wait time 50 also which will cover the cases where the desired element might take more time to load/appear, although 30 is enough. It also depends on the internet speed. – Ajeet Verma Jul 19 '23 at 02:31
  • Thank you! For this line `li_element = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li#fdd8a7d5')))`, how should I replace the `fdd8a7d5` to a variable that I created representing the same id number? – Jessie H Jul 19 '23 at 02:53
  • 1
    if you want to make it dynamic using a variable(let's say `variable_name`) that holds the id number, you can simply use `f-string`. `li_element = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'li#{variable_name}')))` – Ajeet Verma Jul 19 '23 at 03:00
  • I received another error message when I run this code to extract information from the link: findmasa.com/view/map#1456a64a Do you know what the problem is and how to fix it? Thank you again for all your kind help!! – Jessie H Jul 19 '23 at 21:43
  • Please check the solution in the updated answer above – Ajeet Verma Jul 20 '23 at 02:24