web scraping: how to index through multiple div to obtain name_plate value?

Question

New to webscraping. I'm trying to scrape (if possible) this website https://www.memberleap.com/members/directory/search_csam.php for the company titles listed (and maybe the descriptions). With the code I've written below, it returns empty brackets. Using soup.find the responses is 'None'

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.memberleap.com/members/directory/search_csam.php'
requests.get(url)
page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

# uncomment to view html info
#print(soup)

name_plate = soup.find_all('div', class_ = 'name-plate')
print(name_plate)

I want to obtain the name of the companies listed beyond the name-plate index. In this cases, it's ADG Creative and Corporate Office Properties Trust. Is it possible to do this, given the href changes between each company name?

score 0 · Accepted Answer · answered May 25 '23 at 18:03

The names you see on the page are loaded from external URL via JavaScript. To simulate this requests you can use next example:

import requests
from bs4 import BeautifulSoup

api_url = 'https://www.memberleap.com/members/directory/search_csam_ajax.php?org_id=CSAM'

data = {
    "buttonNum": "B1",
    "searchValues[bci]": "0",
    "searchValues[order]": "alpha",
    "searchValues[keyword]": "",
    "searchValues[employer_only]": "",
    "searchValues[county]": "",
    "searchValues[zip_code]": "",
    "searchValues[zip_range]": "5",
    "searchValues[providerassociate]": "",
    "searchValues[dynamic_field_3786]": "0",
    "searchValues[dynamic_field_3787]": "0",
    "searchValues[latitude]": "",
    "searchValues[longitude]": "",
    "two_column": ""
}

headers = {'X-Requested-With': 'XMLHttpRequest'}


for p in range(1, 4): # <-- increase the page numbers here
    data['buttonNum'] = f'B{p}'
    soup = BeautifulSoup(requests.post(api_url, data=data, headers=headers).content, 'html.parser')
    for n in soup.select('.name-plate'):
        print(n.text)

Prints:


...
Transwestern
TriBridge Partners, LLC
Univ. of MD Advanced Cybersecurity Experience for Students
University System of Maryland
Venture Potential
Weller Development Company

Thanks this is really helpful. I have a few questions. How did you find the api url? Was it assumed from the .php extension? The api url is inaccessible (at least to me); how did you know how to population 'data' with the fields provided? — codingcat, May 25 '23 at 18:57
@codingcat When you open Web Developer Tools in Chrome/Firefox -> Network tab and click on any pagination button, the URL should be then visible there. — Andrej Kesely, May 25 '23 at 19:01

web scraping: how to index through multiple div to obtain name_plate value?

1 Answers1