I have a column with both English and Chinese text.
Example: The hills have eyes. 隔山有眼
Expected results: The hills have eyes.
How can I extract the English text from that string using sql, please.
Thanks for help.
I have a column with both English and Chinese text.
Example: The hills have eyes. 隔山有眼
Expected results: The hills have eyes.
How can I extract the English text from that string using sql, please.
Thanks for help.
A quick-and-dirty way simply converts the string to ASCII and removes the '?'
-- which is the representation of the other characters:
select replace(convert(t.str using ascii), '?', '')
from t;
The only downside is that you lose '?'
characters in the original string as well.
Here is a db<>fiddle.
For more control over the replacement, you can use regexp_replace()
:
select regexp_replace(t.str, '[^a-zA-Z0-9.?, ]', '')
from t;
Unfortunately, I am not aware of a character class for ASCII-only characters.
One option you have is to use a function that returns just the english only text.
Additionally, you could make it dual-purpose with another parameter to determine if you want the English text or Non-English text to switch the <127
comparison.
CREATE FUNCTION `EnglishOnly`(String VARCHAR(100))
RETURNS varchar(100)
NO SQL
BEGIN
DECLARE output VARCHAR(100) DEFAULT '';
DECLARE i INTEGER DEFAULT 1;
DECLARE ch varchar(1);
IF LENGTH(string) > 0 THEN
WHILE(i <= LENGTH(string)) DO
SET ch=SUBSTRING(string, i, 1);
IF ASCII(ch)<127 then
set output = CONCAT(output,ch);
END IF;
SET i = i + 1;
END WHILE;
END IF;
RETURN output;
END;
You can then sinply use it like so
select EnglishOnly ("The hills have eyes 隔山有眼that see all.")
Output
The hills have eyes that see all.
Example Fiddle