1

I have read a csv file using numpy genfromtxt

csv_file = np.genfromtxt(args.dataset, delimiter=',',skip_header=1,usecols=(0,1,2,3,4,5), dtype=None)

Ques: How to use string_input_producer to queue and batch the files.

T T
  • 144
  • 1
  • 3
  • 17
  • Tell something about the resulting `csv_file` array. dtype, shape? – hpaulj Jun 04 '17 at 22:20
  • @hpaulj `>>> type(csv_file) ` `csv_file.shape (37810,)` The csv has following fields `Filename Annotation tag Upper left corner X Upper left corner Y Lower right corner X Lower right corner Y` – T T Jun 04 '17 at 22:44
  • How about `dtype`? That shape is 1d, so I suspect it is a structured array with multiple `fields`, not columns. `dtype=None` gives you this. What does `tensorflow` have to say about using structured arrays? – hpaulj Jun 05 '17 at 00:20
  • More on structured array - https://stackoverflow.com/q/44295375 – hpaulj Jun 05 '17 at 03:44

1 Answers1

0

You can read a Numpy array from CSV as you do and chop it up into batches manually. However, TF has a built-in functionality of reading from multiple CSV files and putting rows together either into randomized or sequential batches. You can read cells of varying data-types and convert them into your relevant data-types as need be.

The working code to do this is discussed in this question: Converting TensorFlow tutorial to work with my own data

In a nutshell, the key functions you need are tf.TextLineReader, tf.train.string_input_producer, and tf.train.shuffle_batch, or tf.train.batch, depending on your needs.

The only limitation of that method that I'm aware of is that your rows within the CSV file should be of equal length.

VS_FF
  • 2,353
  • 3
  • 16
  • 34