0

I am trying to train my model on preprocessed data stored in the parquet format. I am using keras to train an autoencoder model. And in order to load the data on the go instead of loading whole data in memory, I am using petastorm.

Troupble is, the GPU utilization on kaggle notebook barely touches 5 or 6%.

Whereas, I want almost commplete GPU utilization.

Below is the code I am using:

early_stop = EarlyStopping(monitor='val_loss',patience=1)
with make_batch_reader('file:///kaggle/working/scaled.parquet', num_epochs=1,shuffle_row_groups=False) as train_reader:
    #Loading batch(Main Batch) of data with 6400 rows
    train_ds = make_petastorm_dataset(train_reader).unbatch().map(lambda x:(tf.convert_to_tensor(x))).batch(6400,drop_remainder=True)
    print(type(train_ds))

    for ele in train_ds:
        tensor = tf.reshape(ele,(6400,1,15))
        # Below we are obtaining a sub-batch from the main batch and t
        model.fit(tensor,tensor,batch_size=32,epochs=1, validation_split=0.1,callbacks=[early_stop],verbose=1)```

What have I tried so far:

I implemented below based on internet suggestions. But there was no improvement in the results.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
      # Restrict TensorFlow to only use the first GPU
    try:
        tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
        tf.config.experimental.set_memory_growth(gpus[0], True)
    except RuntimeError as e:
        print(e)

I also performed a test as suggested by stackoverflow post

which tells me that GPU is active.

how do I accelerate the performance?

James Z
  • 12,209
  • 10
  • 24
  • 44

0 Answers0