This is an old question, but analysis of the question is somewhat incomplete. At it's simplest: not all word-forming characters are alphabetic characters. It is insufficient to match words. The python definition of alphabetic are those Unicode characters assigned the categories of “Lm”, “Lt”, “Lu”, “Ll”, and “Lo”.
This excludes many word forming characters including combining diacritics, dependent vowels in South Asian and South East Asian languages, the punct volant in Catalan, etc.
Additionally Python's definition of an alphabetic character doesn't always align with Unicode's definition. For Unicode, we use the categories “Lm”, “Lt”, “Lu”, “Ll”, “Lo”, "Nl", and "Other_Alphabetic".
The question gives the results for Python's interpretation:
for i in ['මට', 'කෑම', 'කන්න', 'ඕන']:
print(i.isalpha())
Results in:
True
False
False
True
For Unicode definition:
import regex
for i in ['මට', 'කෑම', 'කන්න', 'ඕන']:
print(bool(regex.match(r'^\p{Alphabetic}+$', i)))
With the results:
True
True
False
True
Which is slightly better, but not sufficient. One possible addition is to expand the regex pattern:
for i in ['මට', 'කෑම', 'කන්න', 'ඕන']:
if len(i) == 1:
result = bool(regex.match(r'[\p{Alphabetic}]', i))
else:
result = bool(regex.match(r'^\p{Alphabetic}[\p{Alphabetic}\p{Mn}\p{Mc}\u00B7]*$', i))
print(result)
Which gives:
True
True
True
True
Alternatively use the metacharacter for word forming characters:
for i in ['මට', 'කෑම', 'කන්න', 'ඕන']:
print(bool(regex.match(r'[\w]+', i)))
which gives:
True
True
True
True