51

I have read up on remove the character 'u' in a list but I am using google app engine and it does not seem to work!

def get(self):
    players = db.GqlQuery("SELECT * FROM Player")
    print players
    playerInfo  = {}

    test = []

    for player in players:
        email =  player.email
        gem =  str(player.gem)
        a = "{email:"+email + ",gem:" +gem +"}"

        test.append(a)


    ast.literal_eval(json.dumps(test))
    print test

Final output:

[u'{email:test@gmail.com,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test1,gem:0}']
smci
  • 32,567
  • 20
  • 113
  • 146
Brian Li
  • 573
  • 2
  • 7
  • 10
  • 2
    the character "u" isn't in the list, it's in the `repr` of a unicode string, which is what's printed if you try to `print` a whole list. – Wooble Mar 19 '12 at 15:39
  • The `u` denotes Unicode strings. It doesn't seem to be a problem by itself that the list contains Unicode strings, so what's your actual issue? – Sven Marnach Mar 19 '12 at 15:39
  • 1
    The code `ast.literal_eval(json.dumps(test))` calculates a value and then throws it away. – Karl Knechtel Mar 19 '12 at 15:42

8 Answers8

63

That 'u' is part of the external representation of the string, meaning it's a Unicode string as opposed to a byte string. It's not in the string, it's part of the type.

As an example, you can create a new Unicode string literal by using the same synax. For instance:

>>> sandwich = u"smörgås"
>>> sandwich
u'sm\xf6rg\xe5s'

This creates a new Unicode string whose value is the Swedish word for sandwich. You can see that the non-English characters are represented by their Unicode code points, ö is \xf6 and å is \xe5. The 'u' prefix appears just like in your example to signify that this string holds Unicode text.

To get rid of those, you need to encode the Unicode string into some byte-oriented representation, such as UTF-8. You can do that with e.g.:

>>> sandwich.encode("utf-8")
'sm\xc3\xb6rg\xc3\xa5s'

Here, we get a new string without the prefix 'u', since this is a byte string. It contains the bytes representing the characters of the Unicode string, with the Swedish characters resulting in multiple bytes due to the wonders of the UTF-8 encoding.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 2
    Don't confuse a Unicode string (an object in memory) and its text representation (that you could use to specify the object in Python source code). Consider `print(sandwich)` vs. `print(repr(sandwich))`. Don't encode text to bytes. – jfs Oct 29 '15 at 19:50
24
arr = [str(r) for r in arr]

This basically converts all your elements in string. Hence removes the encoding. Hence the u which represents encoding gets removed Will do the work easily and efficiently

mohdnaveed
  • 506
  • 5
  • 13
  • Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term educational value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Sep 29 '16 at 14:21
  • And use StackOverflow's code formatting markdown to code-format your snippets ;) – brandonscript Sep 29 '16 at 19:05
18

The u means the strings are unicode. Translate all the strings to ascii to get rid of it:

a.encode('ascii', 'ignore')
Intra
  • 2,089
  • 3
  • 19
  • 23
13

u'AB' is just a text representation of the corresponding Unicode string. Here're several methods that create exactly the same Unicode string:

L = [u'AB', u'\x41\x42', u'\u0041\u0042', unichr(65) + unichr(66)]
print u", ".join(L)

Output

AB, AB, AB, AB

There is no u'' in memory. It is just the way to represent the unicode object in Python 2 (how you would write the Unicode string literal in a Python source code). By default print L is equivalent to print "[%s]" % ", ".join(map(repr, L)) i.e., repr() function is called for each list item:

print L
print "[%s]" % ", ".join(map(repr, L))

Output

[u'AB', u'AB', u'AB', u'AB']
[u'AB', u'AB', u'AB', u'AB']

If you are working in a REPL then a customizable sys.displayhook is used that calls repr() on each object by default:

>>> L = [u'AB', u'\x41\x42', u'\u0041\u0042', unichr(65) + unichr(66)]
>>> L
[u'AB', u'AB', u'AB', u'AB']
>>> ", ".join(L)
u'AB, AB, AB, AB'
>>> print ", ".join(L)
AB, AB, AB, AB

Don't encode to bytes. Print unicode directly.


In your specific case, I would create a Python list and use json.dumps() to serialize it instead of using string formatting to create JSON text:

#!/usr/bin/env python2
import json
# ...
test = [dict(email=player.email, gem=player.gem)
        for player in players]
print test
print json.dumps(test)

Output

[{'email': u'test@gmail.com', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test1', 'gem': 0}]
[{"email": "test@gmail.com", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test1", "gem": 0}]
jfs
  • 399,953
  • 195
  • 994
  • 1,670
6
[u'{email:test@gmail.com,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test1,gem:0}']

'u' denotes unicode characters. We can easily remove this with map function on the final list element

map(str, test)

Another way is when you are appending it to the list

test.append(str(a))
tripleee
  • 175,061
  • 34
  • 275
  • 318
HimanshuGahlot
  • 561
  • 4
  • 11
  • 1
    This basically duplicates an existing answer https://stackoverflow.com/a/39771186/874188 – tripleee Mar 12 '18 at 16:46
  • Hey @tripleee try to use timeit for both the solutions. You will be able to see the difference. Map method is faster. And `test.append(str(a))` is creating list simultaneously not iterating upon the list after creating the list hence saving the time. – HimanshuGahlot Mar 13 '18 at 12:12
5

Please Use map() python function.

Input: In case of list of values

index = [u'CARBO1004' u'CARBO1006' u'CARBO1008' u'CARBO1009' u'CARBO1020']

encoded_string = map(str, index)

Output: ['CARBO1004', 'CARBO1006', 'CARBO1008', 'CARBO1009', 'CARBO1020']

For a Single string input:

index = u'CARBO1004'
# Use Any one of the encoding scheme.
index.encode("utf-8")  # To utf-8 encoding scheme
index.encode('ascii', 'ignore')  # To Ignore Encoding Errors and set to default scheme

Output: 'CARBO1004'

Hilar AK
  • 1,655
  • 13
  • 25
4

You don't "remove the character 'u' from a list", you encode Unicode strings. In fact the strings you have are perfectly fine for most uses; you will just need to encode them appropriately before outputting them.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • you don't need to encode a Unicode string; you could print it directly `print(unicode_string)`, [example](http://stackoverflow.com/a/30551552/4279) – jfs Oct 29 '15 at 19:59
  • Depends on where you're outputting it to. – kindall Oct 29 '15 at 20:53
  • obviously, though the default is still: use Unicode to work with text in Python. Don't encode to bytes unless necessary ([my answer shows that it is not necessary](http://stackoverflow.com/a/33423708/4279)) -- I'm sure you know the concept of Unicode sandwich – jfs Oct 29 '15 at 21:03
-1

For python datasets you can use an index.

tmpColumnsSQL = ("show columns in dim.date_dim")
hiveCursor.execute(tmpColumnsSQL)
columnlist = hiveCursor.fetchall()

for columns in jayscolumnlist:
    print columns[0]

for i in range(len(jayscolumnlist)):    
    print columns[i][0])
Community
  • 1
  • 1
mdcscry
  • 1
  • 1