0

I am trying to figure out a way to make a numpy array out of a dataframe so that i can use it as training data for tensorflow this is a function that takes candles for a stock price and makes a dataframe with pandas, now the dataframe values are all floats so the datatype is float32 correct me if i am wrong how can i convert the output with out the first line of course to a numpy array for tensor flow use

def some_function(candles):
   date_time = []
    open_lst = []
    high_lst = []
    low_lst = []
    close_lst = [] 
    volume_lst = []
    for item in candles:
        #print (item)
        t_time = float(item[0])/1000
        #print (t_time)
        #dt_obj = datetime.fromtimestamp(t_time)
        date_time.append(t_time)
        #date_time.append(dt_obj)
        open_lst.append(float(item[1]))
        high_lst.append(float(item[2]))
        low_lst.append(float(item[3]))
        close_lst.append(float(item[4]))
        volume_lst.append(float(item[5]))
    ## creating data frame 
    coin_data_frame = {
        'date_time' : date_time,
        'open'  : open_lst,
        'high'  : high_lst,
        'low'   : low_lst,
        'close' : close_lst,
        'volume': volume_lst,
    }
    df = pd.DataFrame(coin_data_frame , columns = [ 'date_time' , 'open' , 'high' , 'low' , 'close','volume' ])

    #print (df.head(5))


    ### the last 3,5 hours 
    self.df = df

    df['close'] = df[['close']].shift(-15)
    df.set_index("date_time", inplace=True)

   # graph_df(df.head(10))
    print (df.tail(40))

output:

               open      high       low     close    volume
 date_time                                                    
 1.592598e+09  0.001719  0.001720  0.001718  0.001720    342.21
 1.592598e+09  0.001719  0.001719  0.001718  0.001720   1217.08
 1.592599e+09  0.001719  0.001719  0.001718  0.001718    237.83
 1.592599e+09  0.001719  0.001719  0.001718  0.001718    228.67
 1.592599e+09  0.001719  0.001722  0.001718  0.001718   1690.65
 1.592600e+09  0.001721  0.001721  0.001719  0.001717   1251.64
 1.592600e+09  0.001719  0.001722  0.001719  0.001717   1625.74
 1.592600e+09  0.001721  0.001722  0.001720  0.001717    446.60
 1.592600e+09  0.001721  0.001721  0.001719  0.001716    372.68
 1.592601e+09  0.001720  0.001721  0.001719  0.001718    330.26
 1.592601e+09  0.001721  0.001722  0.001721  0.001718    475.65
 1.592601e+09  0.001721  0.001722  0.001720  0.001718    406.49
 1.592602e+09  0.001721  0.001721  0.001719  0.001719   1013.71
 1.592602e+09  0.001720  0.001721  0.001720  0.001720    602.16
 1.592602e+09  0.001721  0.001721  0.001720  0.001720    138.23
 1.592602e+09  0.001720  0.001721  0.001720       NaN    441.67
 1.592603e+09  0.001720  0.001721  0.001719       NaN    100.16
 1.592603e+09  0.001721  0.001721  0.001718       NaN   8551.14
 1.592603e+09  0.001718  0.001718  0.001716       NaN  28164.34
 1.592604e+09  0.001718  0.001719  0.001717       NaN  27695.52
 1.592604e+09  0.001718  0.001719  0.001715       NaN  17872.19
 1.592604e+09  0.001717  0.001717  0.001715       NaN   8310.23
 1.592605e+09  0.001717  0.001717  0.001715       NaN    754.65
 1.592605e+09  0.001717  0.001717  0.001716       NaN    695.99
 1.592605e+09  0.001716  0.001718  0.001716       NaN    921.44
 1.592606e+09  0.001718  0.001719  0.001717       NaN   1474.45
 1.592606e+09  0.001718  0.001720  0.001717       NaN   3991.33
 1.592606e+09  0.001718  0.001720  0.001717       NaN    457.34
 1.592606e+09  0.001719  0.001720  0.001718       NaN   1165.05
 1.592607e+09  0.001720  0.001720  0.001718       NaN   1786.93
Sec Team
  • 47
  • 8
  • Does this answer your question [convert-pandas-dataframe-to-numpy-array](https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array)? – MrNobody33 Jun 19 '20 at 23:08

1 Answers1

0

Simply doing df.to_numpy() will give you the numpy array you want. (for pandas>=0.24. For lower versions, the equivalent is df.values which is now deprecated)

Just make sure you have saved your "target" dataframe column to a y vector beforehand and call df.drop() to remove it from the dataframe before converting to numpy so that it's not fed into your network by accident.

Also, this will not include the df.index column (the data_time's) in the resulting array. I suppose this is your expected behaviour.

kyriakosSt
  • 1,754
  • 2
  • 15
  • 33
  • does python 2.7.17 64bit support df.to_numpy() ? – Sec Team Jun 19 '20 at 23:19
  • I think so, but not sure. The function was introduced in version 0.24 which according to the documentation is the last version that will support python 2.7. So as long as you have that version it should work. My advise however would be to plan to migrate everything in python 3 better sooner than later, as such problems will keep on coming. I also don't see any reason why the 64bit would be a hindering factor. – kyriakosSt Jun 20 '20 at 00:17