Update
While I appreciated AloneTogether's answer, I didn't like that I was using take() and it was separate from model.fit.
I put another answer here if you want to look at it. It involves subclassing Model. It's not too bad.
End of Update
I have a simple example, a parquet file with 8 columns named feature_# populated with 1 to 100 for each column
feature_1 feature_2 ... feature_8
1 1 1
2 2 2
... ... ...
99 99 99
100 100 100
my model:
all_cols = ["feature_1","feature_2","feature_3","feature_4","feature_5","feature_6","feature_7","feature_8"]
x_cols = ["feature_1","feature_2","feature_3","feature_4","feature_5","feature_6","feature_7"]
inputs = [Input(shape=(1,),name=col) for col in x_cols]
merged = Concatenate(axis=1)(inputs)
x = Dense(50, activation="relu")(merged)
x = Dense(20,activation="relu")(x)
outputs = Dense(101,activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
opt = tf.keras.optimizers.Adam(learning_rate=.001)
model.compile(loss="sparse_categorical_crossentropy",\
optimizer=opt,metrics=['accuracy'])
I use petastorm like so:
batch_size = 4
with make_batch_reader('%s/df_100.parquet' % data_dir, num_epochs=1,
schema_fields=all_cols) as train_reader:
with make_batch_reader('%s/df_100.parquet' % data_dir, num_epochs=1,
schema_fields=all_cols) as val_reader:
train_ds = make_petastorm_dataset(train_reader) \
.unbatch() \
.map(
lambda x: (tuple(getattr(x, col) for col in x_cols),getattr(x,"feature_8"))
) \
.batch(batch_size)
val_ds = make_petastorm_dataset(val_reader) \
.unbatch() \
.map(
lambda x: (tuple(getattr(x, col) for col in x_cols),
getattr(x,"feature_8"))
) \
.batch(batch_size)
For this simple example I use the same data for train as validation. I want to confirm that the whole dataset is going to the model.fit() So I write a Custom Callback
class MyCustomCallback(tf.keras.callbacks.Callback):
def __init__(self, train_data):
self.mylist = []
self.train_data = train_data
def on_train_batch_begin(self, batch, logs=None):
print(list(self.train_data.take(1).as_numpy_iterator())[0][0][0])
# and I pass the dataset to the custom callback:
callbacks.append(MyCustomCallback(train_ds))
doesn't print all the values...1 to 100. If I iterate over the dataset (simple for loop) without a model.fit then I do get all 1 to 100, so I think the take() is competing with the model.fit, just a theory.
I have also tried:
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_train_batch_begin(self, batch, logs=None):
print(self.model.layers[0].input) # or .output
#or
#print(self.model.layers[0].get_weights())
But this doesn't get me any real values and get_weights() prints out empty arrays
this is what printing input prints out:
KerasTensor(type_spec=TensorSpec(shape=(None, 1), dtype=tf.float32, name='feature_1'), name='feature_1', description="created by layer 'feature_1'")
I have tried using K.eval() on the input and output of the layer as well and that ends up with a numpy problem that is not fixed by any of the eager settings.
I really don't think this should be so hard. I just want to peak at the dataset before it goes into training.
I have fooled around with repeat(), cache(), and simply iterating over the dataset before the model.fit but I don't like the idea that this happens before the model.fit and that unless it is cached it reshuffles it, etc...
But I also want to be able to arbitrarily look at the model, any value, any weight, at any time. I don't feel like I can access this stuff, but feel like I should be able to.
Any help is appreciated.
oh, and using tensorflow 2.6.2 atm with tf.keras