-1

I have recently started using BeautifulSoup for web scraping. I am trying to extract names of all the artists from the first page of the national gallery of arts website.

Here is my code

import requests

from bs4 import BeautifulSoup

data=requests.get('https://www.nga.gov/Collection/artists.html?pageNumber=1')

soup=BeautifulSoup(data.content,'html.parser')

soup.find_all('a')

When I do this I get all the links present in the page except the links which contain artist names.

For example, for artist "Greek A" Factory ,this is the tag found after using inspect option in Chrome '"Greek A" Factory' But this not found anywhere in the soup object which I have created. Can you let me know what mistake I am doing here?

deceze
  • 510,633
  • 85
  • 743
  • 889
sampippin
  • 123
  • 1
  • 2
  • 12
  • The data within the `ul class returns` that has the hyperlink reference text Greek A Factory has dynamic content. Here's a related [question](https://stackoverflow.com/questions/17597424/how-to-retrieve-the-values-of-dynamic-html-content-using-python) – skrubber Nov 02 '17 at 03:05

1 Answers1

1

Try this:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
driver.get('https://www.nga.gov/Collection/artists.html?pageNumber=1')
time.sleep(5)
soup = BeautifulSoup(driver.page_source,'lxml')
driver.quit()

for artist_name in soup.select('.title a'):
    print(artist_name.text)

Partial results:

"Greek A" Factory
2 Bit Comics
7 Freds Press
A. B.
Aachen, Hans von
Aarland, Johann Carl Wilhelm
Abakanowicz, Magdalena
SIM
  • 21,997
  • 5
  • 37
  • 109