pandas.read_html
only returns the table data which is present on the not-scrolled HTML page. So the table data, which would have been returned, with scrolling, is not in the list of data frames returned. How do I get it to return the list of data frames only after following the given steps:
- Scroll to the bottom
- Wait for the content to load
- If content is no more loading, then return
- Go to step 1
My Code:
import pandas as pd
url = 'https://finance.yahoo.com/quote/GOOG/history?period1=1566844200&period2=1598466600&interval=1d&filter=history&frequency=1d'
dfs = pd.read_html(url)
print(dfs[0])
Actual Result:
Date Open High Low Close* Adj Close** Volume
0 Aug 26, 2020 1608.00 1659.22 1603.60 1652.38 1652.38 3993400
1 Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
2 Aug 24, 2020 1593.98 1614.17 1580.57 1588.20 1588.20 1409900
3 Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
4 Aug 20, 2020 1543.45 1585.87 1538.20 1581.75 1581.75 1706900
... ... ... ... ... ... ... ...
96 Apr 09, 2020 1224.08 1225.57 1196.73 1211.45 1211.45 2175400
97 Apr 08, 2020 1206.50 1219.07 1188.16 1210.28 1210.28 1975100
98 Apr 07, 2020 1221.00 1225.00 1182.23 1186.51 1186.51 2387300
99 Apr 06, 2020 1138.00 1194.66 1130.94 1186.92 1186.92 2664700
100 *CPA *CPA *CPA *CPA *CPA *CPA *CPA
[101 rows × 7 columns]
Expected Result:
Date Open High Low Close* Adj Close** Volume
0 Aug 26, 2020 1608.00 1659.22 1603.60 1652.38 1652.38 3993400
1 Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
2 Aug 24, 2020 1593.98 1614.17 1580.57 1588.20 1588.20 1409900
3 Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
4 Aug 20, 2020 1543.45 1585.87 1538.20 1581.75 1581.75 1706900
... ... ... ... ... ... ... ...
249 Apr 30, 2019 1224.08 1225.57 1196.73 1211.45 1211.45 2175400
250 Apr 29, 2019 1206.50 1219.07 1188.16 1210.28 1210.28 1975100
251 Apr 27, 2019 1221.00 1225.00 1182.23 1186.51 1186.51 2387300
252 Aug 26, 2019 1138.00 1194.66 1130.94 1186.92 1186.92 2664700
253 *CPA *CPA *CPA *CPA *CPA *CPA *CPA
[253 rows × 7 columns]