I've written a script in python to parse different profile names available in different websites. Each link is connected to each individual in which their profile information are available. At this moment, I'm only interested to scrape their profile names. I've provided three different links to three different persons in my script. The below script is doing just fine. I've used css selectors to scrape the profile information from the three sites. As it is limited in number, I have got it handled. However, it could have been hundreds of links.
Now, my question is: as each site contains very different source code from each other, how can I get all the profile names out of those sites with a single script apart from what I did here by including separate selectors
as those sites selectors are known to me? What if the links are hundreds in numbers?
Here is what I've written to get the profile names (it's doing fine here):
import requests
from bs4 import BeautifulSoup
links = {
"https://www.paulweiss.com/professionals/associates/robert-j-agar",
"http://www.cadwalader.com/index.php?/professionals/matthew-lefkowitz",
"https://www.kirkland.com/sitecontent.cfm?contentID=220&itemID=12061"
}
for link in links:
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("#leftnav,.article,.main-content-container"):
pro_name = item.select(".page-hdr h1,b.hidepf,.bioBreadcrumb span")[0].text
print(pro_name)
Output:
Robert J Agar
Matthew Lefkowitz
Mark Adler