-1

New to webscraping. I'm trying to scrape (if possible) this website https://www.memberleap.com/members/directory/search_csam.php for the company titles listed (and maybe the descriptions). With the code I've written below, it returns empty brackets. Using soup.find the responses is 'None'

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.memberleap.com/members/directory/search_csam.php'
requests.get(url)
page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

# uncomment to view html info
#print(soup)

name_plate = soup.find_all('div', class_ = 'name-plate')
print(name_plate)

html of the website I'm trying to scrape

I want to obtain the name of the companies listed beyond the name-plate index. In this cases, it's ADG Creative and Corporate Office Properties Trust. Is it possible to do this, given the href changes between each company name?

codingcat
  • 25
  • 4

1 Answers1

0

The names you see on the page are loaded from external URL via JavaScript. To simulate this requests you can use next example:

import requests
from bs4 import BeautifulSoup

api_url = 'https://www.memberleap.com/members/directory/search_csam_ajax.php?org_id=CSAM'

data = {
    "buttonNum": "B1",
    "searchValues[bci]": "0",
    "searchValues[order]": "alpha",
    "searchValues[keyword]": "",
    "searchValues[employer_only]": "",
    "searchValues[county]": "",
    "searchValues[zip_code]": "",
    "searchValues[zip_range]": "5",
    "searchValues[providerassociate]": "",
    "searchValues[dynamic_field_3786]": "0",
    "searchValues[dynamic_field_3787]": "0",
    "searchValues[latitude]": "",
    "searchValues[longitude]": "",
    "two_column": ""
}

headers = {'X-Requested-With': 'XMLHttpRequest'}


for p in range(1, 4): # <-- increase the page numbers here
    data['buttonNum'] = f'B{p}'
    soup = BeautifulSoup(requests.post(api_url, data=data, headers=headers).content, 'html.parser')
    for n in soup.select('.name-plate'):
        print(n.text)

Prints:


...
Transwestern
TriBridge Partners, LLC
Univ. of MD Advanced Cybersecurity Experience for Students
University System of Maryland
Venture Potential
Weller Development Company
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Thanks this is really helpful. I have a few questions. How did you find the api url? Was it assumed from the .php extension? The api url is inaccessible (at least to me); how did you know how to population 'data' with the fields provided? – codingcat May 25 '23 at 18:57
  • @codingcat When you open Web Developer Tools in Chrome/Firefox -> Network tab and click on any pagination button, the URL should be then visible there. – Andrej Kesely May 25 '23 at 19:01