2

I have column details in one table. that column stores email contents in HTML format. column data type is blob . my requirement to search and find any email content contains non English characters ie foreign languages.

The table with 51000 records. In 51000 records i need filter only email with non -English characters. it may be 100 or greater than that .once i filter those records i will manually identify the languages using Google translator

Nagendran
  • 33
  • 1
  • 1
  • 7

1 Answers1

0

In MySQL it would be something like this:

SELECT foo 
FROM bar
WHERE somecolumn REGEXP '[^\x00-\x80]+'

Matching on characters not in ASCII range 0-128.

also perhaps:

WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

REGEXP '[^[.NUL.]-[.DEL.]]'

REGEXP '[^ -~]'.

You will need to tune the pattern to your needs.

Also a novel approach shown here: How can I find non-ASCII characters in MySQL?

WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)

Community
  • 1
  • 1
ficuscr
  • 6,975
  • 2
  • 32
  • 52