find arabic word string in string give error 'ascii' codec can't decode

Question

i write this function for check if month in persian exist in uicode string, replace it with number of month . i use this encode in header

`#!/usr/bin/python
# -*- coding: utf-8 -*-`

this is my def to convert month

def changeData(date):
                if date:
                   date.encode('utf-8')
                    if "فروردین".encode('utf-8') in date:
                        return str.replace(":فروردین", ":1")
                    elif "اردیبهشت".encode('utf-8') in date:
                        return str.replace(":اردیبهشت", ":2")
                    elif "خرداد".encode('utf-8') in date:
                        return str.replace(":خرداد", ":3")
                    elif "تیر".encode('utf-8') in date:
                        return str.replace(":تیر", ":41")
                    elif "مرداد".encode('utf-8') in date:
                        return str.replace(":مرداد", ":5")
                    elif "شهریور".encode('utf-8') in date:
                        return str.replace(":شهریور", ":6")
                    elif "مهر".encode('utf-8') in date:
                        return str.replace(":مهر", ":7")
                    elif "آبان".encode('utf-8') in date:
                        return str.replace(":آبان", ":8")
                    elif "آذر".encode('utf-8') in date:
                        return str.replace(":آذر", ":9")
                    elif "دی".encode('utf-8') in date:
                        return str.replace(":دی", ":10")
                    elif "بهمن".encode('utf-8') in date:
                        return str.replace(":بهمن", ":11")
                    elif "اسفند".encode('utf-8') in date:
                        return str.replace(":اسفند", ":12")

i pass date with unicode format in function then convert it to encode('utf-8') but give me this error

if "فروردین".encode('utf-8') in date:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

how i can solve this problem

Duplicate? https://stackoverflow.com/questions/9644099/python-ascii-codec-cant-decode-byte — Tsagana Nokhaeva, Nov 01 '17 at 11:44

MaximTitarenko · Accepted Answer · 2017-11-01T12:48:56.130

I assume Python 2.7.

So:

"فروردین".encode('utf-8') # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

The problem is the fact that in Python 2.7 strings are bytes:

print(repr("فروردین")) # '\xd9\x81\xd8\xb1\xd9\x88\xd8\xb1\xd8\xaf\xdb\x8c\xd9\x86'

With the following code:

"فروردین".encode('utf-8')

you're trying to encode bytes which is logically incorrect because:

ENCODING: unicode --> bytes 
DECODING: bytes --> unicode

But Python doesn't throw smth like TypeError, because Python is smart.
In such a case it tries first to decode the given bytes to unicode and then execute encoding specified by user.
The problem is that Python does the described decoding with a default encoding which is ASCII in Python 2. Therefore the program terminates with the UnicodeDecodeError.

The described decoding is similar to the:

unicode("فروردین") # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

So, you shouldn't encode byte-string and you have to DECODE it in order to receive unicode:

u = "فروردین".decode('utf-8') 
print(type(u)) # <type 'unicode'>

Another way to get unicode is to use u-literal + encoding declaration:

# coding: utf-8

u = u"فروردین"
print(type(u)) # <type 'unicode'> 

print(u == "فروردین".decode('utf-8')) # True

or just declare utf-8 in the source file, and prefix the strings with `u` like in `u"فروردین"` - that way calling `decode` is not necessary. — jsbueno, Nov 01 '17 at 12:34

find arabic word string in string give error 'ascii' codec can't decode

1 Answers1