Trying to understand typeahead tokenizing

Question

I don't have formal schooling in computer science so that maybe I why I don't understand it. I read on wikepedia that tokenizing means breaking up strings of text into words or phrases called tokens. You could use those tokens for inputs. So I am assuming that's what the bloodhound suggestion is doing when it does Bloodhound.tokenizers.whitespace(d.num).

1) please tell my why it is says white space . 2) does the above mean that it is spliting up an object by the num property. It puts all the values of the num property into an array and stores it into a property somewhere magical called datumTokenizer

full part of that code:

  datumTokenizer: function(d) { 
    return Bloodhound.tokenizers.whitespace(d.num); 
  },

look the example from this page. It uses an object.

local: [{ num: 'one' }, { num: 'two' }, { num: 'three' }],

It's an obj. 3) shouldn't we use some thing like below that has obj

Bloodhound.tokenizers.obj.whitespace(d.num) . notice the obj.

4) would Bloodhound.tokenizers.whitespace(d.num) split up the example obj to ["one", "two","three"]

According to this : answer I think the answer is yes and he calls the list an "index"

and when we do queryTokenizer: Bloodhound.tokenizers.whitespace, the doc say that "queryTokenizer – A function with the signature (query) that transforms a query into an array of string tokens"

5) how are we doing that in this example? oh, just thinking. Are we using that datumTokenizer array that has the strings of words separated by spaces for the query? why would we need to do that though. we already have the array.

var mySource = new Bloodhound({
  datumTokenizer: function(d) { 
    return Bloodhound.tokenizers.whitespace(d.num); 
  },
  //doesn't whitespace need to be called? "A function with the signature (query)"[from docs]
  queryTokenizer: Bloodhound.tokenizers.whitespace,
  local: [{ num: 'one' }, { num: 'two' }, { num: 'three' }],
  prefetch: '/prefetch',
  remote: '/remote?q=%QUERY'
});

mySource.initialize();

$('.typeahead').typeahead(null, {
  displayKey: 'num',
  source: mySource.ttAdapter()
});

EDIT: Why does it say whitespace? Whitespace from where?

EDIT 2: I found what seems to be relevant source code.

 var tokenizers = function() {
    "use strict";
    return {
        nonword: nonword,
        whitespace: whitespace,
        obj: {
            nonword: getObjTokenizer(nonword),
            whitespace: getObjTokenizer(whitespace)
        }
    };
    function whitespace(str) {
        str = _.toStr(str);
        return str ? str.split(/\s+/) : [];
    }

so I see that if it is an object it makes it into a string and splits it up into an array by whitespace. so how exactly does datumTokenizer: work? how is it iterating (getting the d.num) through the object? do you see a a forloop or forEach()? . sorry for too many questions but it's one topic.

EDIT 3: I just saw in the doc this

datumTokenizer – A function with the signature (datum) that transforms a datum into an array of string tokens. Required.

So the "datum" in this case is the object that belongs to the property local in the Bloodhound instance?

I think I'm getting there.

Trying to understand typeahead tokenizing

0 Answers0