I'm working with a student who is obsessed with coins. We're trying to write Python code in a Colab notebook that will scrape a particular website to extract specific data about coins and add it to a pandas df we already created.
Here is the code for the df we created manually with all the information we want to automatically extract from other coins on the site. It works great.
import pandas as pd
pennies = pd.DataFrame({
'category': ['Indian Head'],
'coin': ['1859 1C MS'],
'mint': ['Philadelphia'],
'mintage': [36400000],
'composition': ['Copper-Nickel'],
'weight_grams': [4.67],
'diameter_mm': [19],
'edge': ['Plain'],
'obverse_url': ['https://s3.amazonaws.com/ngccoin-production/us-coin-explorer/5747236-003o.jpg'],
'reverse_url': ['https://www.ngccoin.com/coin-explorer/united-states/cents/indian-cents-1859-1909/12057/1859-1c-ms/']
})
pennies['date'] = pennies['coin'].apply(lambda x: x[:4])
pennies = pennies[['category', 'coin', 'date', 'mint', 'mintage', 'composition', 'weight_grams', 'diameter_mm', 'edge', 'obverse_url', 'reverse_url']]
pennies.head()
This website lists all the coins we'd like to add.
First, we're trying to write a function that will scrape relevant info about the next specific coin and add to df. The page with the next individual coin his here.
I'm using ChatGPT to try to print just the value for the 'Mint: 'and it gave me the following code, which does not work
import requests
from bs4 import BeautifulSoup
url = 'https://www.ngccoin.com/coin-explorer/united-states/cents/indian-cents-1859-1909/12058/1860-1c-ms/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
specs_list = soup.find('ul', class_='ce-coin__specs-list')
mint_element = specs_list.find('strong', string='Mint:').find_next_sibling('span').text.strip()
print(mint_element)
AttributeError Traceback (most recent call last)
<ipython-input-29-e47935119dab> in <cell line: 10>()
8
9 specs_list = soup.find('ul', class_='ce-coin__specs-list')
---> 10 mint_element = specs_list.find('strong', string='Mint:').find_next_sibling('span').text.strip()
11
12 print(mint_element)
AttributeError: 'NoneType' object has no attribute 'find_next_sibling'