-1

I have a custom tokenizer and want to use it for prediction in Production API. How do I save/download the tokenizer?

This is my code trying to save it:

import pickle
from tensorflow.python.lib.io import file_io

with file_io.FileIO('tokenizer.pickle', 'wb') as handle:
  pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

No error, but I can't find the tokenizer after saving it. So I assume the code didn't work?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
kaka
  • 83
  • 7
  • Where are you saving? Where are you looking for it? What's your current working directory? – desertnaut Jun 11 '21 at 11:30
  • On google colab, so it should be in the G drive. But I try searching for it and couldn't find it. – kaka Jun 11 '21 at 13:27
  • G drive itself has subfolders, and it is not even the default directory when working in Colab. Please include the relevant info – desertnaut Jun 11 '21 at 13:33
  • Sorry I don't understand.... everything is automatically saved in G Drive by default. I searched the whole G Drive which includes all subfolders. I don't know what information to provide. How do you define the path to save it from? So my code is fine and you think the tokenizer is saved but hidden somewhere? – kaka Jun 11 '21 at 14:26
  • `My Drive > Colab Notebooks` This is the folder that all notebooks saved, automatically, I didn't define any path. – kaka Jun 11 '21 at 14:36

1 Answers1

0

Here is the situation, using a simple file to disentangle the issue from irrelevant specificities like pickle, Tensorflow, and tokenizers:

# Run in a new Colab notebook:
%pwd
/content
%ls
sample_data/

Let's save a simple file foo.npy:

import numpy as np
np.save('foo', np.array([1,2,3]))

%ls
foo.npy  sample_data/

In this stage, %ls should show tokenizer.pickle in your case instead of foo.npy.

Now, Google Drive & Colab do not communicate by default; you have to mount the drive first (it will ask for identification):

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

After which, an %ls command will give:

%ls
drive/  foo.npy  sample_data/

and you can now navigate (and save) inside drive/ (i.e. actually in your Google Drive), changing the path accordingly. Anything saved there can be retrieved later.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    Thanks desertnaut for the detailed response. It didn't solve the problem directly but led me to investigate further. The problem was that I was saving the file at the peer level of the drive, not inside it, that's why I couldn't find it in gdrive. I'll make it as answer. People who land here will be able to figure out reading our comments. – kaka Jun 11 '21 at 17:07