Tensorflow numpy to tensorflow

Question

I have read a csv file using numpy genfromtxt

csv_file = np.genfromtxt(args.dataset, delimiter=',',skip_header=1,usecols=(0,1,2,3,4,5), dtype=None)

Ques: How to use string_input_producer to queue and batch the files.

Tell something about the resulting `csv_file` array. dtype, shape? — hpaulj, Jun 04 '17 at 22:20
@hpaulj `>>> type(csv_file) ` `csv_file.shape (37810,)` The csv has following fields `Filename Annotation tag Upper left corner X Upper left corner Y Lower right corner X Lower right corner Y` — T T, Jun 04 '17 at 22:44
How about `dtype`? That shape is 1d, so I suspect it is a structured array with multiple `fields`, not columns. `dtype=None` gives you this. What does `tensorflow` have to say about using structured arrays? — hpaulj, Jun 05 '17 at 00:20
More on structured array - https://stackoverflow.com/q/44295375 — hpaulj, Jun 05 '17 at 03:44

score 0 · Accepted Answer · answered Jun 05 '17 at 06:05

You can read a Numpy array from CSV as you do and chop it up into batches manually. However, TF has a built-in functionality of reading from multiple CSV files and putting rows together either into randomized or sequential batches. You can read cells of varying data-types and convert them into your relevant data-types as need be.

The working code to do this is discussed in this question: Converting TensorFlow tutorial to work with my own data

In a nutshell, the key functions you need are tf.TextLineReader, tf.train.string_input_producer, and tf.train.shuffle_batch, or tf.train.batch, depending on your needs.

The only limitation of that method that I'm aware of is that your rows within the CSV file should be of equal length.

Tensorflow numpy to tensorflow

1 Answers1