-1

I am using Textblob for processing textual data.

My code is:

from textblob import TextBlob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
wiki.tags

I am getting output as:

[(u'Python', u'NNP'), (u'is', u'VBZ'), (u'a', u'DT'), (u'high-level', u'JJ'), (u'general-purpose', u'JJ'), (u'programming', u'NN'), (u'language', u'NN')]

instead of:

[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]

What might be reason for the letter 'u' getting prepended to each word?

I'm working on Ubuntu 14.04.2 with Python 2.7.6 version.

overlord
  • 1,059
  • 1
  • 14
  • 21
  • converted to unicode strings. – Avinash Raj Jul 11 '15 at 06:28
  • any way to remove that? – overlord Jul 11 '15 at 06:30
  • @overlord: _Why_ do you want to remove it? The `u` signifies that it's a unicode string, which is generally a good feature (unless you intentionally do not want to support unicode). You only see the `u` when you are printing the array in this way. When you actually use the value it shouldn't display the `u` anywhere. – grovesNL Jul 11 '15 at 06:36
  • Yes, I now know the reason after reading john0609's comment which happened to come after my above comment. Thanks for taking time to reply. – overlord Jul 11 '15 at 06:38

1 Answers1

1

This is a unicode format representation by Python. This do not make any affect while storing the string or even doing string manipulations. More, it is better to have a sign which shows that what format are we representing and it is the convention followed by Python.

john0609
  • 167
  • 3
  • 13