30

I'm still new to Python, Machine Learning and TensorFlow, but doing my best to jump right in head-first. I could use some help though.

My data is currently in a Pandas dataframe. How can I convert this to TensorFlow object? I've tried

dataVar_tensor = tf.constant(dataVar)
depth_tensor = tf.constant(depth)

But, I get errors [15780 rows x 9 columns] - got shape [15780, 9], but wanted [].

I'm sure this is probably a straightforward question, but I could really use the help.

Many thanks

ps. I'm running tensorflow 0.12 with Anaconda Python 3.5 on Windows 10

jlt199
  • 2,349
  • 6
  • 23
  • 43
  • what do you want to do with this data? is it the input for a neural network you want to train? from the error message it looks like constant just want a constant, so an int or a float, not a matrix – rAyyy Feb 17 '17 at 02:41
  • @rAyyy Yes, my plan is to eventually input it into a Neural Network. At the moment I'm simply trying to take the MNIST example from the tutorial and make it work on my own data. Which I'm reading in from a csv file using pandas.read_csv() – jlt199 Feb 17 '17 at 15:12

6 Answers6

18

Here is one solution I found that works on Google Colab:

import pandas as pd
import tensorflow as tf
#Read the file to a pandas object
data=pd.read_csv('filedir')
#convert the pandas object to a tensor
data=tf.convert_to_tensor(data)
type(data)

This will print something like:

tensorflow.python.framework.ops.Tensor
Pikamander2
  • 7,332
  • 3
  • 48
  • 69
Thedarknight
  • 181
  • 1
  • 3
12

I've converted my Pandas dataframe to a Numpy array using df.values

Now, using

dataVar_tensor = tf.constant(dataVar, dtype = tf.float32, shape=[15780,9])
depth_tensor = tf.constant(depth, 'float32',shape=[15780,1])

seems to work. I can't say it does definitively because I have other hurdles to overcome to get my code working, but it's hopefully a step in the right direction. Thanks for all your help

As an aside, my trials of getting the tutorial to work on my own data are continued in my next question Converting TensorFlow tutorial to work with my own data

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
jlt199
  • 2,349
  • 6
  • 23
  • 43
  • 1
    I converted a panda series (y_train) of ints, to tensor and then to one_hot as follows: dataVar_tensor = tf.Variable(y_train.as_matrix(), dtype = tf.int32) result = tf.one_hot(dataVar_tensor, depth) – Vaibhav Dec 05 '17 at 06:47
  • pandas.DataFrame.values is indeed what is suggested on TensorFlow tutorial https://www.tensorflow.org/tutorials/load_data/pandas_dataframe#load_data_using_tfdatadataset – toliveira Nov 08 '20 at 15:09
4

The following works easily based on numpy array input data:

import tensorflow as tf
import numpy as np
a = np.array([1,2,3])
with tf.Session() as sess:
    tf.global_variables_initializer().run()

    dataVar = tf.constant(a)
    print(dataVar.eval())

-> [1 2 3]

Don't forget to start the session and run() or eval() your tensor object to see its content; otherwise it will just give you its generic description.

I suspect that since your data is in the DataFrame rather than a simply array, you need to experiment with the shape parameter of tf.constant(), which you are currently not specifying, in order to help it understand the dimensionality of the DataFrame and deal with its indices, etc.?

VS_FF
  • 2,353
  • 3
  • 16
  • 34
  • Thanks. I'm running an InteractiveSession and I've tried several different variations of `dataVar_tensor = tf.constant(dataVar, dtype = tf.float32, shape=[15780,9])` but so far no luck – jlt199 Feb 17 '17 at 15:23
2

You can convert a the dataframe column to a tensor object like so:

tf.constant((df['column_name']))

This should return you a tensor variable which looks something like this:

<tf.Tensor: id=275634, shape=(48895,), dtype=float64, numpy=
array([1, 2, ...])>

Also, you can ad any number of dataframe columns as you want, like so:

tf.constant(([cdf['column1'], cdf['column2']]))

Hope this helps.

praveen kumar
  • 81
  • 1
  • 3
0

hottbox.pdtools.utils (the Pandas integration tools of the HOTTBOX API) provides the functions

   pd_to_tensor(df[, keep_index])
   tensor_to_pd(tensor[, col_name])

for conversion in both directions.

StefanQ
  • 708
  • 10
  • 16
0

You can use tf.estimator.inputs.pandas_input_fn in your make_input_fn(X, y, num_epochs) function. I've not managed to get it to work with a multi-index, however. I fixed this issue by turning it into a standard integer index, using df.reset_index(drop=True)

Heather Walker
  • 541
  • 4
  • 3