-1

I need to analyze three years of historical stocks and fetch data from website: "https://finance.yahoo.com/quote/AAPL/history?period1=1476255600&period2=1570863600&interval=1d&filter=history&frequency=1d" however, I check the data frame only 3 months were given.

I tried to change different data period, I did see the changes in the websites. However, I still can get 3 months of data regardless of any changes periods of time. I used url request as follows:


url="https://finance.yahoo.com/quote/AAPL/historyperiod1=1476255600&period2=1570863600&interval=1d&filter=history&frequency=1d " #open link html = urlopen(url) soup = BeautifulSoup(html)

data = []
allrows= soup.find_all("tr")
for row in allrows :
    row_list = row.find_all("td")
    dataRow= []
    for cell in row_list:
        dataRow.append(cell.text)
    data.append(dataRow)

data = data[6:] 

#Check data head and tail to ensure the dataframe is correct
df.columns = header_list
print(df.head())
print(df.tail())
----------


          Date    Open    High     Low  Close* Adj Close**      Volume
0  Oct 04, 2019  225.64  227.49  223.89  227.01      227.01  34,619,700
1  Oct 03, 2019  218.43  220.96  215.13  220.82      220.82  28,606,500
2  Oct 02, 2019  223.06  223.58  217.93  218.96      218.96  34,612,300
3  Oct 01, 2019  225.07  228.22  224.20  224.59      224.59  34,805,800
4  Sep 30, 2019  220.90  224.58  220.79  223.97      223.97  25,977,400
                                                 Date    Open    High     
Low  \
91                                       May 29, 2019  176.42  179.35  
176.00   
92                                       May 28, 2019  178.92  180.59  
177.91   
93                                       May 24, 2019  180.20  182.14  
178.62   
94                                       May 23, 2019  179.80  180.54  
177.81   
95  *Close price adjusted for splits.**Adjusted cl...    None    None    
None   

    Close* Adj Close**      Volume  
91  177.38      176.71  28,481,200  
92  178.23      177.56  27,948,200  
93  178.97      178.29  23,714,700  
94  179.66      178.98  36,529,700  
95    None        None        None
#check data shape
df.shape
(96, 7)
Gladies Chang
  • 59
  • 2
  • 5
  • 1
    Can you share the request? – y.luis.rojo Oct 12 '19 at 17:58
  • Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: [mcve] – QHarr Oct 12 '19 at 20:29
  • I used url to open link followed by beautiful soup to handle the link. This is my code : url="https://finance.yahoo.com/quote/AAPL/history?period1=1476255600&period2=1570863600&interval=1d&filter=history&frequency=1d" #open link html = urlopen(url) soup = BeautifulSoup(html) – Gladies Chang Oct 12 '19 at 21:12

1 Answers1

1

The page shows new data on scroll down. Because there is no 'scrolls' performed by BeautifulSoup you see only the first portion of data.

How to handle it? You can see several data requests using tab "Network" in browser developer tools. So you can either try to imitate them and get data right from the API or use a headless browser to open the page, scroll it N times and parse all the data from the page.

Also inspect page source closely - all the data could be already there in JSON or in hidden HTML elements.

Alex K.
  • 835
  • 6
  • 15
  • Could you give me a heads-up how I can solve this issue? Much appreciated! – Gladies Chang Oct 13 '19 at 02:55
  • 1
    I highly recommend using the API, but don't use the deprecated one. For the latest API, see https://stackoverflow.com/questions/49705047/downloading-mutliple-stocks-at-once-from-yahoo-finance-python. You can also try Selenium for the scroll function, if you still don't want to use the API. – QuantStats Oct 13 '19 at 08:26
  • Check link posted by @QuantStats, there are references to modules with examples that can be useful, – Alex K. Oct 13 '19 at 15:36