I'm trying to scrape data about coins from a website to a python dataframe on Google Colab

Question

I'm working with a student who is obsessed with coins. We're trying to write Python code in a Colab notebook that will scrape a particular website to extract specific data about coins and add it to a pandas df we already created.

Here is the code for the df we created manually with all the information we want to automatically extract from other coins on the site. It works great.

import pandas as pd
pennies = pd.DataFrame({
  'category': ['Indian Head'],
  'coin': ['1859 1C MS'],
  'mint': ['Philadelphia'],
  'mintage': [36400000],
  'composition': ['Copper-Nickel'],
  'weight_grams': [4.67],
  'diameter_mm': [19],
  'edge': ['Plain'],
  'obverse_url': ['https://s3.amazonaws.com/ngccoin-production/us-coin-explorer/5747236-003o.jpg'],
  'reverse_url': ['https://www.ngccoin.com/coin-explorer/united-states/cents/indian-cents-1859-1909/12057/1859-1c-ms/']
})

pennies['date'] = pennies['coin'].apply(lambda x: x[:4])
pennies = pennies[['category', 'coin', 'date', 'mint', 'mintage', 'composition', 'weight_grams', 'diameter_mm', 'edge', 'obverse_url', 'reverse_url']]

pennies.head()

This website lists all the coins we'd like to add.

First, we're trying to write a function that will scrape relevant info about the next specific coin and add to df. The page with the next individual coin his here.

I'm using ChatGPT to try to print just the value for the 'Mint: 'and it gave me the following code, which does not work

import requests
from bs4 import BeautifulSoup

url = 'https://www.ngccoin.com/coin-explorer/united-states/cents/indian-cents-1859-1909/12058/1860-1c-ms/'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

specs_list = soup.find('ul', class_='ce-coin__specs-list')
mint_element = specs_list.find('strong', string='Mint:').find_next_sibling('span').text.strip()

print(mint_element)

AttributeError                            Traceback (most recent call last)
<ipython-input-29-e47935119dab> in <cell line: 10>()
      8 
      9 specs_list = soup.find('ul', class_='ce-coin__specs-list')
---> 10 mint_element = specs_list.find('strong', string='Mint:').find_next_sibling('span').text.strip()
     11 
     12 print(mint_element)

AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

score -1 · Answer 1 · answered May 09 '23 at 20:02

-1

You can parse all the coins you need one by one and use json_normalize to get your table.

import pandas as pd
import requests

data = list()
url = 'https://www.ngccoin.com/coin-explorer/data/categories/cents/subcategories/indian-cents-1859-1909/'
response = requests.get(url)

for coin in response.json()["Coins"]:
    url = f"https://www.ngccoin.com/coin-explorer/data/coins/{coin['CoinID']}/"
    response = requests.get(url)
    data.append(response.json())
    
pennies = pd.json_normalize(data)

answered May 09 '23 at 20:02

not_speshal

22,093
2
15
30

Wow. Thanks! This is the first time I'm using stack overflow. I'm teaching myself data exploration to teach to students and GPT is fantastic to start projects, but only got me so far. – Jason Sheinkopf May 09 '23 at 20:51

I'm trying to scrape data about coins from a website to a python dataframe on Google Colab

1 Answers1