0

This might not be the smartest question but I've spent about an hour trying to figure out and doing research ending up with nothing. As a last resort I am posting my problem here.

The website I am using is https://en.wikipedia.org/wiki/List_of_Super_Bowl_halftime_shows and I want to scrape the tables listed under history.

When I inspect the page I see that it is under an anchor tag with specific titles
enter image description here

I do not mind scraping each table individually/manually but no matter how I try to navigate to the table with its respective anchor and title, my bs(beautifulsoup) object does not have any contents of the table.

I'm guessing the href attribute is used to display the table so my question is how can I scrape the contents of a webpage that is using another link that I do not have access to?

  • Do you want to scrape _all_ tables under "history"? – MendelG Dec 27 '21 at 04:27
  • yes every single superbowl event – swordlordswamplord Dec 27 '21 at 04:28
  • i see it is referencing a link to display the table. maybe im wrong. but i dont see how i can access the table because it seems the link that is used on wikipedia has to do with some local path/link for the user that posted it. – swordlordswamplord Dec 27 '21 at 04:29
  • Do you have the pandas library installed? – MendelG Dec 27 '21 at 04:31
  • yes i am using pandas for this project. u can assume i know the basics of data science with python – swordlordswamplord Dec 27 '21 at 04:32
  • this is actually a "project" on the datacamp website but the problem with it is that it just offers up all the libraries/data and i want to do it all from scratch because i expect that is how it will be done realistically in a job setting or if i want to explore anything on my own so i am trying to do the websites project on my own instead of having my hand held all the way cus ill learn nothing – swordlordswamplord Dec 27 '21 at 04:34

1 Answers1

0

Since you are using pandas you can use read_html() to get all tables and access specific tables using indexing.

import pandas as pd


df = pd.read_html("https://en.wikipedia.org/wiki/List_of_Super_Bowl_halftime_shows")
print(df[0].to_string()) # <-- Acess the first table
MendelG
  • 14,885
  • 4
  • 25
  • 52