Questions tagged [python-unicode]

Python distinguishes between byte strings and unicode strings. *Decoding* transforms bytestrings to unicode; *encoding* transform unicode strings to bytes.

Python distinguishes between byte strings and unicode strings. Decoding transforms bytestrings to unicode; encoding transform unicode strings to bytes.

Remember: you decode your input to unicode, work with unicode, then encode unicode objects for output as bytes.

See the

1053 questions
1488
votes
34 answers

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup. The problem is that the error is not always reproducible; it sometimes works with some pages, and…
Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341
395
votes
13 answers

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c

I have a socket server that is supposed to receive UTF-8 valid characters from clients. The problem is some clients (mainly hackers) are sending all the wrong kind of data over it. I can easily distinguish the genuine client, but I am logging to…
transilvlad
  • 13,974
  • 13
  • 45
  • 80
319
votes
7 answers

"SyntaxError: Non-ASCII character ..." or "SyntaxError: Non-UTF-8 code starting with ..." trying to use non-ASCII text in a Python script

I tried this code in Python 2: def NewFunction(): return '£' But I get an error message that says: SyntaxError: Non-ASCII character '\xa3' in file '...' but no encoding declared; see http://www.python.org/peps/pep-0263.html for…
SNIFFER_dog
  • 3,243
  • 2
  • 14
  • 4
172
votes
10 answers

How to print Unicode character in Python?

I want to make a dictionary where English words point to Russian and French translations. How do I print out unicode characters in Python? Also, how do you store unicode chars in a variable?
NoobDev4iPhone
  • 5,531
  • 10
  • 33
  • 33
131
votes
7 answers

Why does ENcoding a string result in a DEcoding error (UnicodeDecodeError)?

I'm really confused. I tried to encode but the error said can't decode.... >>> "你好".encode("utf8") Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0:…
thoslin
  • 6,659
  • 6
  • 27
  • 29
55
votes
3 answers

Python string to unicode

Possible Duplicate: How do I treat an ASCII string as unicode and unescape the escaped characters in it in python? How do convert unicode escape sequences to unicode characters in a python string I have a string that contains unicode characters…
prongs
  • 9,422
  • 21
  • 67
  • 105
47
votes
2 answers

Python string argument without an encoding

Am trying to a run this piece of code, and it keeps giving an error saying "String argument without an encoding" ota_packet = ota_packet.encode('utf-8') + bytearray(content[current_pos:(final_pos)]) + '\0'.encode('utf-8') Any help?
lonely
  • 681
  • 1
  • 7
  • 10
45
votes
4 answers

Python 3: os.walk() file paths UnicodeEncodeError: 'utf-8' codec can't encode: surrogates not allowed

This code: for root, dirs, files in os.walk('.'): print(root) Gives me this error: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 27: surrogates not allowed How do I walk through a file tree without getting toxic…
Collin Anderson
  • 14,787
  • 6
  • 68
  • 57
44
votes
5 answers

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte

I am trying to read twitter data from json file using python 2.7.12. Code I used is such: import json import sys reload(sys) sys.setdefaultencoding('utf-8') def get_tweets_from_file(file_name): tweets = [] …
wannabhappy
  • 531
  • 2
  • 5
  • 7
42
votes
3 answers

Correctly reading text from Windows-1252(cp1252) file in python

so okay, as the title suggests the problem I have is with correctly reading input from a windows-1252 encoded file in python and inserting said input into SQLAlchemy-MySql table. The current system setup: Windows 7 VM with "Roger Access Control…
Krisjanis Zvaigzne
  • 495
  • 1
  • 6
  • 7
42
votes
1 answer

Removing unicode \u2026 like characters in a string in python2.7

I have a string in python2.7 like this, This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying! How do i convert it to this, This is some text that has to be cleaned! its annoying!
40
votes
6 answers

UnicodeDecodeError: ('utf-8' codec) while reading a csv file

what i am trying is reading a csv to make a dataframe---making changes in a column---again updating/reflecting changed value into same csv(to_csv)- again trying to read that csv to make another dataframe...there i am getting an error…
Satya
  • 5,470
  • 17
  • 47
  • 72
38
votes
1 answer

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

I have this code: printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n') But I get this error when running it: f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't…
speedyrazor
  • 3,127
  • 7
  • 33
  • 51
37
votes
1 answer

Pipreqs: UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1206: character maps to

When I use pipreqs, I have this problem. I use anaconda and Russian Windows. root@DESKTOP-ETLLRI1 C:\Users\root\Desktop\resumes $ pipreqs C:\Users\root\Desktop\resumes Traceback (most recent call last): File…
krax1337
  • 533
  • 1
  • 5
  • 13
1
2 3
70 71