11

I'm trying to read data from CSV files to tensorflow,

https://www.tensorflow.org/versions/r0.7/how_tos/reading_data/index.html#filenames-shuffling-and-epoch-limits

The sample code in official document is like this:

col1, col2, col3, col4, col5 = tf.decode_csv(value, record_defaults=record_defaults)

To read the file, I need to know how many columns and lines in the file beforehand, and if there are 1000 columns, I need to define 1000 variables like col1, col2, col3, col4, col5,..., col1000 , this doesn't look like an efficient way to read data.

My questions

  1. What is the best way to read CSV files into Tensorflow ?

  2. Is there any way to read Database (such as mongoDB) in Tensorflow ?

Lifu Huang
  • 11,930
  • 14
  • 55
  • 77
V Y
  • 685
  • 10
  • 21

4 Answers4

5
  1. You definitely don't need to define col1, col2, to col1000...

    generally, you might do things like this:

    
    columns = tf.decode_csv(value, record_defaults=record_defaults)
    features = tf.pack(columns)
    do_whatever_you_want_to_play_with_features(features)
    
  2. I do not know any off-the-shelf way to directly read data from MongoDB. Maybe you can just write a short script to convert data from MongoDB in a format that Tensorflow supports, I would recommend binary form TFRecord, which is much faster to read than csv record. This is a good blog post about this topic. Or you can choose to implement a customized data reader by yourself, see the official doc here.

Lifu Huang
  • 11,930
  • 14
  • 55
  • 77
3
def func()
    return 1,2,3,4

b = func() 

print b #(1, 2, 3, 4)

print [num for num in b] # [1, 2, 3, 4]

Hi its nothing to do with tensorflow its simple python need not define 1000 variable. tf.decode_csv returns a tuple.

No idea on database handling, I think u can use python and just input the data in the form of array to the tensorflow.

Hope this is helpful

Aravind Pilla
  • 416
  • 4
  • 14
1

of course you can implement to directly read batch random sort trained data from mongo to feed to tensorflow. below is my way:

        for step in range(self.steps):


            pageNum=1;
            while(True):
                trainArray,trainLabelsArray = loadBatchTrainDataFromMongo(****)
                if len(trainArray)==0:
                    logging.info("train datas consume up!")
                    break;
                logging.info("started to train")
                sess.run([model.train_op],
                         feed_dict={self.input: trainArray,
                                    self.output: np.asarray(trainLabelsArray),
                                    self.keep_prob: params['dropout_rate']})

                pageNum=pageNum+1;

and also you need preprocess trained data in mongodb, such like: assign each trained data in mongodb a random sort value...

zhao yufei
  • 335
  • 3
  • 6
0

Is there any way to read Database (such as mongoDB) in Tensorflow ?

Try TFMongoDB, a C++ implemented dataset op for TensorFlow that allows you to connect to your MongoDB:

pip install tfmongodb

There's an example on the GitHub page on how to read data. See also pypi: TFMongoDB

Wan B.
  • 18,367
  • 4
  • 54
  • 71