2

I'm reading this book "Python Text Processing with NLTK", and on the book the results are:

>>> stopwords.fileids()
['danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'portuguese', 'russian', 'spanish', 'swedish', 'turkish']

But as I run the codes in terminal, results are:

>>> stopwords.fileids()
[u'danish', u'dutch', u'english', u'finnish', u'french', u'german', u'hungarian', u'italian', u'norwegian', u'portuguese', u'russian', u'spanish', u'swedish', u'turkish']

What is the "u" in front of every string?

Will Vousden
  • 32,488
  • 9
  • 84
  • 95
Kathryn
  • 123
  • 2
  • 10
  • And of [What does the 'u' symbol mean in front of string values?](http://stackoverflow.com/questions/11279331/what-does-the-u-symbol-mean-in-front-of-string-values) – koukouviou Mar 26 '16 at 07:28
  • Comparing two strings, one with prefix 'u' and the other without one, would still return True (prefect match) so in many cases you don't need to worry about it. – akash12300 Mar 26 '16 at 08:19
  • 2
    You're currently using Python 2. You'd want to switch to Python 3 for natural language processing, as it has superior text manipulation capabilities; this `u` prefix is also gone there. NLTK version 3.0 supports Python 3. – Antti Haapala -- Слава Україні Mar 26 '16 at 09:25
  • @AnttiHaapala, if the user was using Python 2 with NLTK v. 2, there would be no unicode prefix, correct? The results in the text she is using were generated (presumably) with Python 2 and NLTK 2, and those results show *no* unicode prefix. – DyingIsFun Aug 04 '16 at 21:18
  • 1
    @Silenus perhaps; but by now anyone doing natural language processing should really be using both Python 3 and NLTK 3. – Antti Haapala -- Слава Україні Aug 05 '16 at 04:54

1 Answers1

4

The u stands for a string containing unicode

Which you can check for yourself by typing this in your python interpreter:

s = unicode('abcdef')
type(s) # <type 'unicode'>
t = u'unicode'
type(t) #<type 'unicode'>

More information on unicode strings python2 | python3

Michal Frystacky
  • 1,418
  • 21
  • 38