0

I know this might seem duplicate, however I am not able to find the solution suitable for me. Or maybe I just need a complete example.

Here is the problem: I want to implement a webpage predicting the class of an input text, thanks to a pre-trained model. I have the json file corresponding to tensorflowjs model and both

  • tokeniser.json (saved by Keras Tokenizer().to_json()
  • vocab.json (saved as in this question corresponding to tokenizer.word_index)

now, I know how to load the model in a javascript object, with the async function of tensorflowjs. How can I do the same for the tokeniser? and how I can then tokenise (under the imported tokeniser) the input text?

======================= Clarification ===========================

The example of my json files can be found at these links

I tried the following code

// loadVocab function to get the vocabulary from json.
async function loadVocab() {
  var word2index = await JSON.parse(await JSON.stringify(vocabPath));
  return word2index;
}

where vocabPath is a string containing the url above.

at the end of my script I call a function init()

async function init(){
    model = await loadModel();
    word2index = await loadVocab();
    console.log(word2index["the"]); // I expect 1
}

but of course I got undefined since I guess it takes the real string of the url as a json, not the json at that url.

any idea?

Oscar
  • 460
  • 3
  • 18
  • Is the question about how to load the json files ? – edkeveked Oct 26 '20 at 19:49
  • it is more about how to load the json files as tensorflowjs objects. – Oscar Oct 27 '20 at 07:07
  • The question is a bit unclear to me. Could you please give an example of the json you have and the kind of result you are expecting after loading it ? – edkeveked Oct 27 '20 at 08:41
  • I have a simple json containing a "dictionary" word-to-index. I would like to implement a tokeniser in javascript, such that, given a string, takes each word, goes in the word-to-index dictionary and returns the list of integers corresponding to the input text – Oscar Oct 27 '20 at 16:45

2 Answers2

1

To load a vocabulary saved from python like that :

import json 
with open( 'word_dict.json' , 'w' ) as file:    
    json.dump( tokenizer.word_index , file )

You must load the JSON with an AJAX call like that:

function getJSON(url) {
    var resp ;
    var xmlHttp ;

    resp  = '' ;
    xmlHttp = new XMLHttpRequest();

    if(xmlHttp != null)
    {
        xmlHttp.open( "GET", url, false );
        xmlHttp.send( null );
        resp = xmlHttp.responseText;
    }

    return resp ;
}

var vocab = JSON.parse(getJSON('./word_dict.json'));

The python side is well explained here : Converting Python Keras NLP Model to Tensorflowjs

And a related question for the next step, how to vectorized it is here : Tensorflow.js tokenizer

  • Thank you very much. I will try this solution. I give you the best answer, however, I will post mine as well, as I solved in a maybe less "pure", but more synthetic way. – Oscar Jan 21 '21 at 09:25
0

I solved the issue in the following way finally,

let vocabPath = '/url/to/my/vocab.json';

async function loadVocab() {
    let vocab = await (await fetch(vocabPath)).json();
    return vocab;
}
Oscar
  • 460
  • 3
  • 18