1

I'm just using Tensorflow and its tf.learn api to create and train a DNNRegressor model. I have an integer feature column that is multivalent (I can have more than one integer value in that column for each row) and I use tf.contrib.layers.sparse_column_with_integerized_feature for this feature column.

now my question is what is the right delimeter for the multivalent feature column in csv file. for example supose I have a csv that col2 is multivalent feature and its not one hot feature:

  1, 2, 1:2:3:4, 5
  2, 1, 4:5, 6

as you see I use ':' for seperating integer feature valuse in col2 but it seems its not right and I got this error while running DNNRegressor with declaring this feature column as tf.contrib.layers.sparse_column_with_integerized_feature:

 'Value passed to parameter 'x' has DataType string not in list of allowed 
  values: int32, int64, float32, float64'.

I really appreciate your help

1 Answers1

1

tf.contrib.layers.sparse_column_with_integerized_feature is for int32 or int64 values only, so it won't work exactly as you want.

But tensorflow supports multi-dimensions in numerical columns, so you can work with tf.feature_column.numeric_column and specify the shape that you have. Note that tensorflow will expect that all of those shapes are the same, so you'll need to pad all of your values to a common shape.

The colon ':' delimeter is fine for multivalent columns, here's an example how to read multiple values into a DataFrame with pandas (the question is about XML, but the same works for CSV). This data frame you can pass into model.train() function as input_fn.

Maxim
  • 52,561
  • 27
  • 155
  • 209