0

I have a column with both English and Chinese text.

Example: The hills have eyes. 隔山有眼

Expected results: The hills have eyes.

How can I extract the English text from that string using sql, please.

Thanks for help.

Diamond
  • 1
  • 1
  • Maybe this helps: [Javascript: How to extract english words or Chinese characters from a string?](https://stackoverflow.com/questions/58637490/javascript-how-to-extract-english-words-or-chinese-characters-from-a-string), you only need to convert that idea to MySQL, or SQL-server (you are not clear in your tags about which RDBMS you are using) – Luuk Jun 26 '21 at 11:57
  • SQL Server and MySQL are very different databases. Tag only with the database you are really using. – Gordon Linoff Jun 26 '21 at 11:57

2 Answers2

0

A quick-and-dirty way simply converts the string to ASCII and removes the '?' -- which is the representation of the other characters:

select replace(convert(t.str using ascii), '?', '')
from t;

The only downside is that you lose '?' characters in the original string as well.

Here is a db<>fiddle.

For more control over the replacement, you can use regexp_replace():

select regexp_replace(t.str, '[^a-zA-Z0-9.?, ]', '')
from t;

Unfortunately, I am not aware of a character class for ASCII-only characters.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

One option you have is to use a function that returns just the english only text.

Additionally, you could make it dual-purpose with another parameter to determine if you want the English text or Non-English text to switch the <127 comparison.

CREATE FUNCTION `EnglishOnly`(String VARCHAR(100)) 
RETURNS varchar(100)
NO SQL
BEGIN
    DECLARE output VARCHAR(100) DEFAULT '';
    DECLARE i INTEGER DEFAULT 1;
    DECLARE ch varchar(1);

    IF LENGTH(string) > 0 THEN
        WHILE(i <= LENGTH(string)) DO
            SET ch=SUBSTRING(string, i, 1);
            IF ASCII(ch)<127 then
              set output = CONCAT(output,ch);
             END IF;
            SET i = i + 1;
        END WHILE;
    END IF;    
    RETURN output;
END;

You can then sinply use it like so

select EnglishOnly ("The hills have eyes 隔山有眼that see all.")

Output

The hills have eyes that see all.

Example Fiddle

Stu
  • 30,392
  • 6
  • 14
  • 33