In this code:
>>> import nltk
>>> def word_features(sentence):
... features = {}
... for word in nltk.word_tokenize(sentence):
... features['contains(%s)' % word.lower()] = True
... return features
...
...
...
>>> sent = 'This a foobar word extractor function'
>>> word_features(sent)
{'contains(a)': True, 'contains(word)': True, 'contains(this)': True, 'contains(function)': True, 'contains(extractor)': True, 'contains(foobar)': True}
>>>
This line is trying to populate/fill up a features dictionary.:
features['contains(%s)' % word.lower()] = True
Here's a simple example of dictionary in python (see https://docs.python.org/2/tutorial/datastructures.html#dictionaries for details):
>>> adict = {}
>>> adict['key'] = 'value'
>>> adict['key']
'value'
>>> adict['apple'] = 'red'
>>> adict['apple']
'red'
>>> adict
{'apple': 'red', 'key': 'value'}
And word.lower()
lowercase a string, e.g.
>>> str = 'Apple'
>>> str.lower()
'apple'
>>> str = 'APPLE'
>>> str.lower()
'apple'
>>> str = 'AppLe'
>>> str.lower()
'apple'
And when you do 'contains(%s)' % word
it's trying to create string contain(
and a sign operator and then a )
. The sign operator will be assigned outside the string, e.g.
>>> a = 'apple'
>>> o = 'orange'
>>> '%s' % a
'apple'
>>> '%s and' % a
'apple and'
>>> '%s and %s' % (a,o)
'apple and orange'
The sign operator is similar to the str.format()
function e.g.
>>> a = 'apple'
>>> o = 'orange'
>>> '%s and %s' % (a,o)
'apple and orange'
>>> '{} and {}'.format(a,o)
'apple and orange'
So when the code does 'contains(%s)' % word
it's actually trying to produce a string like this:
>>> 'contains(%s)' % a
'contains(apple)'
And when you put that string into a dictionary as your key, your key will look as such:
>>> adict = {}
>>> key1 = 'contains(%s)' % a
>>> value1 = True
>>> adict[key1] = value1
>>> adict
{'contains(apple)': True}
>>> key2 = 'contains(%s)' % o
>>> value = 'orange'
>>> value2 = False
>>> adict[key2] = value2
>>> adict
{'contains(orange)': False, 'contains(apple)': True}
For more information, see