I am trying to use Python to try to filter verses in a religous arabic text (the Quran) that contain certain words/characters. The program works fine and outputs a CSV file with filtered verses when checking for some characters but when checking for other characters it outputs strange non Arabic symbols. For example when checking for the Arabic letter "Lam" which has unicode 0x0644, the outputted csv is perfect as attached below but when using Arabic letter "Kaf" which has unicode 0x0643 I get a bunch of symbols like سÙورَة٠الÙÙŽØ§ØªÙØÙŽØ©Ù. Thank you in advance for the help. My code:
import csv
mylist = []
with open("Arabic-Original.csv", "r", encoding="utf-8") as file:
csvreader = csv.reader(file)
for row in csvreader:
mylist.append(row)
s = f'{chr(0x0644)}'
f = open("copiedverses.csv", "w", encoding="utf-8")
for i in range(len(mylist)):
if s in mylist[i][0]:
f.write(mylist[i][0] +"\n")
f.close()type here
Using "lam" with a Unicode value of 0x0644
I get something like:
enter image description here
Using "kaf" with a Unicode value of 0x0643
I get this:
enter image description here
The code works well for some letters but not for others, I tried multiple letters that are similar to each other but I still cant find out why for some letters it outputs arabic and for others it does not. Thank you.