2

I'm facing some issues with my databricks cluster configuration, and issue is that i'm not able to put a finger on where and why. I was trying to save a keras model, and it seems to be not going well

dataset = pd.DataFrame([item.split(',') for item in '''6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1'''.split('\n')])

X = dataset.iloc[:,0:8]
y = dataset.iloc[:,8]

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=3, batch_size=10)

accuracy = model.evaluate(X, y, verbose=0)
print(accuracy)

The issue is with saving the model, can anyone help me understand what the error is all about I'm using Python 3.7.3, DBRuntime 6.2 (includes Apache Spark 2.4.4, Scala 2.11)

model.save('/dbfs/FileStore/tables/temp/new_model.h5')

KeyError Traceback (most recent call last) /databricks/python/lib/python3.7/site-packages/keras/engine/saving.py in save_model(model, filepath, overwrite, include_optimizer) 540 with H5Dict(filepath, mode='w') as h5dict: --> 541 _serialize_model(model, h5dict, include_optimizer) 542 elif hasattr(filepath, 'write') and callable(filepath.write):

/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py in _serialize_model(model, h5dict, include_optimizer) 160 for name, val in zip(weight_names, weight_values): --> 161 layer_group[name] = val 162 if include_optimizer and model.optimizer:

/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py in setitem(self, attr, val) 230 raise KeyError('Cannot set attribute. ' --> 231 'Group with name "{}" exists.'.format(attr)) 232 if is_np:

KeyError: 'Cannot set attribute. Group with name "b\'dense_1/kernel:0\'" exists.'

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last) in ----> 1 model.save('/dbfs/FileStore/tables/temp/new_model.h5')

/databricks/python/lib/python3.7/site-packages/keras/engine/network.py in save(self, filepath, overwrite, include_optimizer) 1150
raise NotImplementedError 1151 from ..models import save_model -> 1152 save_model(self, filepath, overwrite, include_optimizer) 1153 1154 @saving.allow_write_to_gcs

/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py in save_wrapper(obj, filepath, overwrite, *args, **kwargs) 447 os.remove(tmp_filepath) 448 else: --> 449 save_function(obj, filepath, overwrite, *args, **kwargs) 450 451 return save_wrapper

/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py in save_model(model, filepath, overwrite, include_optimizer) 539 return 540 with H5Dict(filepath, mode='w') as h5dict: --> 541 _serialize_model(model, h5dict, include_optimizer) 542 elif hasattr(filepath, 'write') and callable(filepath.write): 543 # write as binary stream

/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py in exit(self, exc_type, exc_val, exc_tb) 368 369 def exit(self, exc_type, exc_val, exc_tb): --> 370 self.close() 371 372

/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py in close(self) 344 def close(self): 345 if isinstance(self.data, h5py.Group): --> 346 self.data.file.flush() 347 if self._is_file: 348 self.data.close()

/databricks/python/lib/python3.7/site-packages/h5py/_hl/files.py in flush(self) 450 """ 451 with phil: --> 452 h5f.flush(self.id) 453 454 @with_phil

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.flush()

RuntimeError: Unable to flush file's cached information (file write failed: time = Fri Jan 31 08:19:53 2020 , filename = '/dbfs/FileStore/tables/temp/new_model.h5', file descriptor = 9, errno = 95, error message = 'Operation not supported', buf = 0x6993c98, total write size = 320, bytes this sub-write = 320, bytes actually written = 18446744073709551615, offset = 800)

Itachi
  • 2,817
  • 27
  • 35
  • i just tried your code(not with databricks), everything works fine for me, could you wrapp everything in one function and pass it to a process? https://stackoverflow.com/questions/2046603/is-it-possible-to-run-function-in-a-subprocess-without-threading-or-writing-a-se – Eliethesaiyan Jan 31 '20 at 09:28
  • What filesystem is /dbfs on? It could be that it does not support some low level operations required for HDF5 to work with it. – Dr. Snoopy Jan 31 '20 at 10:14

2 Answers2

2

I was finally able to save the model, by saving it on driver only and copying it on s3...

import os
import shutil
classification_model.save('news_dedup_model.h5')
shutil.copyfile('/databricks/driver/news_dedup_model.h5', '/dbfs/FileStore/tables/temp/nemish/news_dedup_model.h5')

classification_model = load_model('/dbfs/FileStore/tables/temp/nemish/news_dedup_model.h5', custom_objects={'tf': tf})

Still unable to figure out, why wouldn't it save normally

Itachi
  • 2,817
  • 27
  • 35
1

Because keras model.save() doesn't support writing to a FUSE mount. Doing so you'll get 'Operation Not Supported' error.

You need to first write it to the driver node's local disk (where python's working directory is), then move it to DFBS FUSE mount using '/dbfs/your/path/on/DBFS'.

Jixin Jia
  • 31
  • 2