I am new to python and just been through a couple of books and tutorials on data analysis/ machine learning.
I want to build a classifer and trying to scrape real time stock data.
The following function to pull real time data
from googlefinance import getQuotes
import json
import pandas as pd
import datetime
import requests
def get_intraday_data(symbol, interval_seconds=301, num_days=10):
# Specify URL string based on function inputs.
url_string = 'http://www.google.com/finance/getprices?q= {0}'.format(symbol.upper())
url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)
# Request the text, and split by each line
r = requests.get(url_string).text.split()
# Split each line by a comma, starting at the 8th line
r = [line.split(',') for line in r[7:]]
# Save data in Pandas DataFrame
df = pd.DataFrame(r, columns= ['Datetime','Close','High','Low','Open','Volume'])
# Convert UNIX to Datetime format
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))
return df
When I try to call df, I get the following error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-40-db884686c2f6> in <module>()
18 return df
19
---> 20 symbol = pd.DataFrame(df)
NameError: name 'df' is not defined
The issue is that I want to be able to store this into a seperate date frame and call it later. The function appears to runs and not store it anywhere. I will appreciate guidance on this.