5

Time-related data I initially have as integer in format:

1234 # corresponds to 12:34
2359 # corresponds to 23:59

1) The first option is to describe time as numeric_column:

tf.feature_column.numeric_column(key="start_time", dtype=tf.int32)

2) Another option is to split time into hours and minutes into two separated feature columns:

tf.feature_column.numeric_column(key="start_time_hours", dtype=tf.int32)
tf.feature_column.numeric_column(key="start_time_minutes", dtype=tf.int32)

3) The third option is to maintain a one feature column, but let tensorflow know that it can be described when split into hours and minutes:

tf.feature_column.numeric_column(key="start_time", shape=2, dtype=tf.int32)

Does this split makes sense and what is the difference between options 2) and 3)?

As additional question, I faced with problems how to decode vector data from csv:

1|1|FGTR|1|1|14,2|15,1|329|3|10|2013
1|1|LKJG|1|1|7,2|19,2|479|7|10|2013
1|1|LKJH|1|1|14,2|22,2|500|3|10|2013

How to let tensorflow know that "14,2", "15,1" should be considered as tensors shape=2?

Edit 1:

I found a solution to decode "array"-like data from csv. In train and evaluate functions I added .map step to decode data for some columns:

dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)).map(parse_csv)

Where parse_csv implemented as:

def parse_csv(features, label):
    features['start_time'] = tf.string_to_number(tf.string_split([features['start_time']], delimiter=',').values, tf.int32)
    return features, label

As I think the difference between two separated columns and one column with shape=2 is in a way how "weights" are distributed.

0 Answers0