Passing Genrator function to TF-Hub Universal sentence Encoder from pandas dataframe

Question

I have a pandas dataframe in which one column contains text body of an Email, I am trying to encode it using this tutorial. I have managed to encode the sentences, by

module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/3"
embed = hub.Module(module_url)
tf.logging.set_verbosity(tf.logging.ERROR)
messages = df['EmailBody'].tolist()[:50] #Why 50 explained below
with tf.Session() as session:
  session.run([tf.global_variables_initializer(), tf.tables_initializer()])
  message_embeddings = session.run(embed(messages))

Now if I increase the size from here, it starts leaking out memory, I also tried running it in batches by

module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/3"
embed = hub.Module(module_url)
tf.logging.set_verbosity(tf.logging.ERROR)
messages = df_RF_final['Preprocessed_EmailBody'].tolist()
message_embeddings = []
with tf.Session() as session:
  session.run([tf.global_variables_initializer(), tf.tables_initializer()])
  for i in range(int(len(messages)/100)):
    message_embeddings.append(session.run(embed(messages[i*100:(1+1)*200])))

Which gave the error available at the bottom, I am looking for an implementation where rather than having to pass the list I can pass a generator function, if it is not possible to use a generator function, then gelp me fix the second approach.

Error stack

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1333     try:
-> 1334       return fn(*args)
   1335     except errors.OpError as e:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1318       return self._call_tf_sessionrun(
-> 1319           options, feed_dict, fetch_list, target_list, run_metadata)
   1320 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1406         self._session, options, feed_dict, fetch_list, target_list,
-> 1407         run_metadata)
   1408 

InvalidArgumentError: Requires start <= limit when delta > 0: 0/-2147483648
     [[{{node module_3_apply_default_4/Encoder_en/Transformer/SequenceMask/Range}}]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-13-d0c75bde4d87> in <module>()
      7   session.run([tf.global_variables_initializer(), tf.tables_initializer()])
      8   for i in range(int(len(messages)/100)):
----> 9     message_embeddings.append(session.run(embed(messages[i*100:(1+1)*200])))

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1150     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1151       results = self._do_run(handle, final_targets, final_fetches,
-> 1152                              feed_dict_tensor, options, run_metadata)
   1153     else:
   1154       results = []

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1326     if handle is None:
   1327       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328                            run_metadata)
   1329     else:
   1330       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1346           pass
   1347       message = error_interpolation.interpolate(message, self._graph)
-> 1348       raise type(e)(node_def, op, message)
   1349 
   1350   def _extend_graph(self):

InvalidArgumentError: Requires start <= limit when delta > 0: 0/-2147483648
     [[node module_3_apply_default_4/Encoder_en/Transformer/SequenceMask/Range (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_hub/native_module.py:514) ]]

Caused by op 'module_3_apply_default_4/Encoder_en/Transformer/SequenceMask/Range', defined at:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-d0c75bde4d87>", line 9, in <module>
    message_embeddings.append(session.run(embed(messages[i*100:(1+1)*200])))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_hub/module.py", line 247, in __call__
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_hub/native_module.py", line 514, in create_apply_graph
    import_scope=relative_scope_name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
    meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py", line 235, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3433, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3433, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3325, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Requires start <= limit when delta > 0: 0/-2147483648
     [[node module_3_apply_default_4/Encoder_en/Transformer/SequenceMask/Range (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_hub/native_module.py:514) ]]

score 0 · Answer 1 · answered Sep 17 '19 at 08:12

I had a similar issue and is very similar to "Strongly increasing memory consumption when using ELMo from Tensorflow-Hub". I got a great answer from arnoegw here.

In short, since you are using tf.session you are using the TF.v1 programing model. This means you first build the dataflow and then run it repeatedly i.e. feeding it inputs and fetching outputs from the graph. But you keep adding the new application of hub.Module to the graph. Instead of doing it once and using the graph.

The correct way to implement this according to the answer is:

tf.logging.set_verbosity(tf.logging.ERROR)
messages = df_RF_final['Preprocessed_EmailBody'].tolist()
message_embeddings = []
with hub.eval_function_for_module("https://tfhub.dev/google/universal-sentence-encoder-large/3") as embed:    
        for i in range(int(len(messages)/100)):
            message_embeddings.append(embed(messages[i*100:(1+1)*200])

Passing Genrator function to TF-Hub Universal sentence Encoder from pandas dataframe

1 Answers1