I have tried things like this, but there is no change between the input and output:
def remove_al(text):
if text.startswith('ال'):
text.replace('ال','')
return text
I have tried things like this, but there is no change between the input and output:
def remove_al(text):
if text.startswith('ال'):
text.replace('ال','')
return text
text.replace
returns the updated string but doesn't change it, you should change the code to
text = text.replace(...)
Note that in Python strings are "immutable"; there's no way to change even a single character of a string; you can only create a new string with the value you want.
If you want to only remove the prefix ال
and not all of ال
combinations in the string, I'd rather suggest to use:
def remove_prefix_al(text):
if text.startswith('ال'):
return text[2:]
return text
If you simply use text.replace('ال','')
, this will replace all ال
combinations:
Example
text = 'الاستقلال'
text.replace('ال','')
Output:
'استقل'
I would recommend the method str.lstrip
instead of rolling your own in this case.
example text (alrashid) in Arabic: 'الرَشِيد'
text = 'الرَشِيد'
clean_text = text.lstrip('ال')
print(clean_text)
Note that even though arabic reads from right to left, lstrip
strips the start of the string (which is visually to the right)
also, as user 6502 noted, the issue in your code is because python strings are immutable, thus the function was returning the input back
"ال" as prefix is quite complex in Arabic that you will need Regex to accurately separate it from its stem and other prefixes. The following code will help you isolate "ال" from most words:
import re
text = 'والشعر كالليل أسود'
words = text.split()
for word in words:
alx = re.search(r'''^
([وف])?
([بك])?
(لل)?
(ال)?
(.*)$''', word, re.X)
groups = [alx.group(1), alx.group(2), alx.group(3), alx.group(4), alx.group(5)]
groups = [x for x in groups if x]
print (word, groups)
Running that (in Jupyter) you will get: