I am trying to scrape a table from this site.:http://wayback.archive-it.org/7993/20170110233205/http://www.fda.gov/Safety/Recalls/ArchiveRecalls/2015/default.htm
This table has no id
or class
and only contains summary and width. Is there any way to scrape this table?
Perhaps xpath?
I heard that xpath is not compatible with beautifulsoup and hope that is wrong.
<table width="100%" cellpadding="3" border="1" summary="Layout showing RecallTest table with 6 columns: Date,Brand Name,Product Description,Reason/Problem,Company,Details/Photo" style="margin-bottom:28px">
<thead>
<tr>
<th scope="col" data-type="numeric" data-toggle="true"> Date </th>
</tr>
</thead>
<tbody>
Here is my code:
import requests
from bs4 import BeautifulSoup
link = 'http://wayback.archive-it.org/7993/20170110233205/http://www.fda.gov/Safety/Recalls/ArchiveRecalls/2015/default.htm'
page = 15
pdf = []
for p in range(1,page+1):
l = link + '?page='+str(p)
# Downloading contents of the web page
data = requests.get(l).text
# Creating BeautifulSoup object
soup = BeautifulSoup(data, 'html.parser')
tables = soup.find_all('table')
table = soup.find('table', INSERT XPATH EXPRESSION)
df = pd.DataFrame(columns = ['date','brand','descr','reason','company'])
for row in table.tbody.find_all('tr'):
# Find all data for each column
columns = row.find_all('td')
if columns != []:
date = columns[0].text.strip()