How can i check string is Thai language that return boolean like isalpha()

Question

I'm trying to check str that is only Thai character or not by using regex or any if it can solve

I'm trying to use

re.compile(u"[^\u0E00-\u0E7F']|^'|'$|''")
ret = regexp_thai.sub("", s)

to slice another language or digit by the way it just only slice not for return boolean

I expect output like

s = "engภาษาไทยที่มีสระ123!@"
regexp_thai = re.compile(u"[^\u0E00-\u0E7F']|^'|'$|''") 
ret = regexp_thai.sub("", s)
print(ret)             # ภาษาไทยที่มีสระ
print(isthai(ret))     # True

u0E00-u0E7F is a unicode of Thai language How can I write isthai function

Basically `bool(re.match("^[\u0E00-\u0E7F]*$", test))` should evaluate to `True` iff `test` only consists of Thai characters. Fine tuning for punctuation et al is necessary yet. — Michael Butscher, May 24 '19 at 03:57

score 4 · Accepted Answer · answered May 24 '19 at 03:59

I'm not quite sure what might be the desired output. However, I'm guessing that we like to capture the Tai letters, which based on your original expression, we might just want to add a simple list of chars, wrap it with a capturing group and swipe our desired Tai letters from left to right, maybe similar to:

([\u0E00-\u0E7F]+)

DEMO

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([\u0E00-\u0E7F]+)"

test_str = "engภาษาไทยที่มีสระ123!@"

matches = re.finditer(regex, test_str, re.MULTILINE | re.UNICODE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Demo

const regex = /([\u0E00-\u0E7F]+)/gmu;
const str = `engภาษาไทยที่มีสระ123!@`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx

If this expression wasn't desired, it can be modified or changed in regex101.com.

RegEx Circuit

jex.im visualizes regular expressions:

Reference

Regular Expression to accept all Thai characters and English letters in python

This is a very good answer. You've got functioning code. You've got a visual showing how the regex works. You've got a link to a website that lets you tweak the regex until it does exactly what you want. If this was the Stackoverflow Olympics, and I was a judge, I would give your answer a perfect 10! — devdanke, Jul 19 '20 at 16:49