I try to crawler the tables from this link, I have get position of table content by using F12 inspect.
I have use the follow code, but I get None
result, someone could help? Thanks.
import requests
import json
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
url = 'http://bjjs.zjw.beijing.gov.cn/eportal/ui?pageId=308894'
website_url = requests.get(url).text
soup = BeautifulSoup(website_url, 'lxml')
table = soup.find('table', {'class': 'gridview'})
#table = soup.find('table', {'class': 'criteria'})
print(table)
Please also check this reference, in fact, I want do the similar things here, but the web structure seems different.
Updated: The following code works for one page, but I need to loop other pages as well.
import requests
import json
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
url = 'http://bjjs.zjw.beijing.gov.cn/eportal/ui?pageId=308894'
website_url = requests.get(url).text
soup = BeautifulSoup(website_url, 'lxml')
table = soup.find('table', {'class': 'gridview'})
#https://stackoverflow.com/questions/51090632/python-excel-export
df = pd.read_html(str(table))[0]
df.to_excel('test.xlsx', index = False)
Output:
序号 ... 竣工备案日期
0 1 ... 2020-01-22
1 2 ... 2020-01-22
2 3 ... 2020-01-22
3 4 ... 2020-01-22
4 5 ... 2020-01-22
[5 rows x 9 columns]
Reference related: