1

I am new to python and just been through a couple of books and tutorials on data analysis/ machine learning.

I want to build a classifer and trying to scrape real time stock data.

The following function to pull real time data

from googlefinance import getQuotes
import json
import pandas as pd
import datetime
import requests

def get_intraday_data(symbol, interval_seconds=301, num_days=10):
    # Specify URL string based on function inputs.
    url_string = 'http://www.google.com/finance/getprices?q=   {0}'.format(symbol.upper())
    url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)

    # Request the text, and split by each line
    r = requests.get(url_string).text.split()

    # Split each line by a comma, starting at the 8th line
    r = [line.split(',') for line in r[7:]]

    # Save data in Pandas DataFrame
    df = pd.DataFrame(r, columns=    ['Datetime','Close','High','Low','Open','Volume'])

    # Convert UNIX to Datetime format
    df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))

    return df

When I try to call df, I get the following error:

---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
<ipython-input-40-db884686c2f6> in <module>()
     18     return df
     19 
---> 20 symbol = pd.DataFrame(df)

NameError: name 'df' is not defined

The issue is that I want to be able to store this into a seperate date frame and call it later. The function appears to runs and not store it anywhere. I will appreciate guidance on this.

0xsegfault
  • 2,899
  • 6
  • 28
  • 58
  • I have tried this. Didnt address my problem as I still get the error saying df is not a defined function after adding a procedure to store the results in HDFS @user2539738 – 0xsegfault Oct 07 '16 at 16:29

1 Answers1

3

I'm not familiar enough with computer science terminology to thoroughly explain this to you, but basically, when you call a function that has a return value, that value must be saved as a variable.

df only exists in your function. (I think that's called scope). When you leave the function, df is gone

You're doing

get_intraday_data(symbol, 301,10)

So, after that function is run, the returned variable is gone

instead, do the following:

df = get_intraday_data(symbol, 301,10)

then you can do stuff with it

Alternatively, instead of returning the df, you can pickle it. In your "get_intraday_symbol"

fname = 'file1.P'
df.to_pickle(fname)
return fname

Then, subsequent code has to read the pickled dataframe

fname = get_intraday_data(symbol, 301,10)
df = pd.read_pickle(fname)
Mohammad Athar
  • 1,953
  • 1
  • 15
  • 31