I use this function to replace specific unicode characters
def removeTashkeel(original_text):
cleanText = re.sub(u'[\u064B-\u0652\u06D4\u0670\u0674\u06D5\u0695-\u06ED]+', '',original_text.decode("utf-8"))
return cleanText
I pass this text as an argument
"\nعذرا ڕ .بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِي.عشان هو ڕ http://www.google.com عمـــــر واحد# و@ الوقت\" \n اللي هنضيعه هنتحاسب عليه،:p:p:ppp انا مش هكمل الرواية دي :D it was really 3 4 awesome"
when I return the cleanText this is what i get
u'\n\u0639\u0630\u0631\u0627 .\u0628\u0633\u0645 \u0627\u0644\u0644\u0647 \u0627\u0644\u0631\u062d\u0645\u0646 \u0627\u0644\u0631\u062d\u064a.\u0639\u0634\u0627\u0646 \u0647\u0648 http://www.google.com \u0639\u0645\u0640\u0640\u0640\u0640\u0640\u0631 \u0648\u0627\u062d\u062f# \u0648@ \u0627\u0644\u0648\u0642\u062a" \n \u0627\u0644\u0644\u064a \u0647\u0646\u0636\u064a\u0639\u0647 \u0647\u0646\u062a\u062d\u0627\u0633\u0628 \u0639\u0644\u064a\u0647\u060c:p:p:ppp \u0627\u0646\u0627 \u0645\u0634 \u0647\u0643\u0645\u0644 \u0627\u0644\u0631\u0648\u0627\u064a\u0629 \u062f\u064a :D it was really 3 4 awesome'
when i replace the line return cleanText
with the line print cleanText
I get the result
br>
عذرا ڕ .بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِي.عشان هو ڕ: عمـــــر واحد و الوقت اللي هنضيعه هنتحاسب عليه،:p:p:ppp انا مش هكمل الرواية دي :D 3 4
how can i return the same result produced by print using return because I am not able to use the returned value as a normal string anymore even when i do `cleanText.encode('utf-8')
I would be very thankful for any of your help`