0

I'm a newbie in python. And apologies for a very basic question.

I'm working with python pattern.en library and try to get the synonyms of a word. this is my code and is working fine.

from pattern.en import wordnet
a=wordnet.synsets('human')
print a[0].synonyms

this what the output i get from this:

[u'homo', u'man', u'human being', u'human']

but for my program i need to insert this array as this:

['homo', 'man', 'human being', 'human']

how do i get an output as above and remove the 'u' from my output.

thanks in advance..!

Cactus
  • 27,075
  • 9
  • 69
  • 149

2 Answers2

3

Try proper encoding- But care this u does not have any effect on data- it is just an explicit representation of unicode object (not byte array), if your code needs back unicode then better to feed it unicode.

>>>d =  [u'homo', u'man', u'human being', u'human']
>>>print [i.encode('utf-8') for i in d]
>>>['homo', 'man', 'human being', 'human']
alvas
  • 115,346
  • 109
  • 446
  • 738
Learner
  • 5,192
  • 1
  • 24
  • 36
  • It works but it's not a good advice. There's no need to go back to byte array/ string representation. Since equivalence is never an issue with unicode vs str in python2 and in python3 str is by default unicode – alvas Jan 25 '16 at 11:56
  • @alvas Yes and so i mentioned it (`u does not have any effect on data`) – Learner Jan 25 '16 at 15:30
  • 1
    @Slslam hope you don't mind if i bold it =) – alvas Jan 25 '16 at 15:43
1

In short:

There's no need to convert you list of unicodes into strings. They're the same thing


In long:

The u'...' prefix in the string object represents a Unicode object introduced in Python 2.0, see https://docs.python.org/2/tutorial/introduction.html#unicode-strings

Starting with Python 2.0 a new data type for storing text data is available to the programmer: the Unicode object. It can be used to store and manipulate Unicode data (see http://www.unicode.org/) and integrates well with the existing string objects, providing auto-conversions where necessary.

And since Python 3.0, see https://docs.python.org/3.2/tutorial/introduction.html#about-unicode:

Starting with Python 3.0 all strings support Unicode (see http://www.unicode.org/).

Regardless of what is the default string type, when checking for equivalence, they should be the same in both Python 2.x and 3.x:

alvas@ubi:~$ python2
Python 2.7.11 (default, Dec 15 2015, 16:46:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type(u'man')
<type 'unicode'>
>>> type('man')
<type 'str'>
>>> u'man' == 'man'
True

alvas@ubi:~$ python3
Python 3.4.1 (default, Jun  4 2014, 11:27:44) 
[GCC 4.8.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type(u'man')
<class 'str'>
>>> type('man')
<class 'str'>
>>> u'man' == 'man'
True

And in Python 2, when you MUST or are required to convert from unicode to str type let's say for type checks or something, e.g.:

alvas@ubi:~$ python3
>>> u'man' == 'man'
True
>>> type(u'man') == type('man')
True
>>> exit()
alvas@ubi:~$ python2
>>> u'man' == 'man'
True
>>> type(u'man') == type('man')
False

then you should be able to simply cast it to str(u'man') or u'man'.encode('utf-8').

But there could be some "pain" / endless errors if your unicode string is out of the ascii range and you're trying to write it to file or print it onto console which might not have defaultencoding set to 'utf-8'. In that case, watch https://www.youtube.com/watch?v=sgHbC6udIqc


Additionally, here are similar questions relating to the u'...' prefix:

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738