How can i filter out the Japanese values in an SQL Query?

Question

I am querying from an Oracle 12c database and i need to filter out values which are in Japanese because the Java application receiving these values currently do not have the ability to display Japanese characters. For a quick demo, is there any way we can avoid the Japanese strings from the results?

by only accepting English characters ? `REGEXP_REPLACE` to replace any character which is not in English with `''` — Sudipta Mondal, Apr 26 '17 at 10:10
Possible duplicate of [Unicode range for Japanese](http://stackoverflow.com/questions/19899554/unicode-range-for-japanese) — Shnugo, Apr 26 '17 at 10:11
@Shnugo Not a duplicate - that just gives the unicode ranges for Japanese characters but does not show how to filter those values from a string in Oracle. — MT0, Apr 26 '17 at 10:53
You could set NLS_LANG parameter to character set which is supported by your application. Then Oracle will replace any invalid character by ¿ automatically. — Wernfried Domscheit, Apr 26 '17 at 15:20

Sandeep · Answer 1 · 2017-04-26T10:49:30.743

0

I did not altered the NLS_LANG format to insert the Japanese character properly. Using regexp_like you can filter records which don't have alphanumeric data.

create table table1 (name varchar2(100));
insert into table1 (name) values ('ABC');
insert into table1 (name) values ('DEF');
insert into table1 (name) values ('GHI');
insert into table1 (name) values ('JKL');
insert into table1 (name) values ('GHI');
insert into table1 (name) values ('昨夜のコンサ');
insert into table1 (name) values ('昨夜のABC');

select * from table1;

select * from table1 where regexp_like (name,'^[0-9a-zA-Z]+$');

select nvl(regexp_replace(name,'[^0-9a-zA-Z'']',''),'blank') from table1;

edited Apr 26 '17 at 10:49

answered Apr 26 '17 at 10:18

Sandeep

774
3
8

This will simply ignore all the records, which have Japanese characters, The OP only wants to filter out the Japanese characters. – Sudipta Mondal Apr 26 '17 at 10:23
@SudiptaMondal Thanks for pointing that out. I have now edited my answer. – Sandeep Apr 26 '17 at 10:50
This will ignore too many values including punctuation and characters in languages other than Japanese. The OP's question is only about filtering out Japanese characters (and not other languages). – MT0 Apr 26 '17 at 10:57

score 0 · Answer 2 · answered Apr 26 '17 at 10:43

0

You will need to replace all Non English characters with NULL for all the columns you have in your select statement.

select regexp_replace (column_name,'[^\x80-\xFF]',NULL) from table_name;

This will replace anything which is not in English characters to NULL

answered Apr 26 '17 at 10:43

Sudipta Mondal

2,550
1
19
20

This will match too many characters; the OP does not want to remove non-English characters, just Japanese characters. – MT0 Apr 26 '17 at 10:55
@MT0 wouldn't it be safe to assume, the Java application might have a restriction to only display English characters – Sudipta Mondal Apr 26 '17 at 11:01
You may assume that but that is not what the OP's question is asking. – MT0 Apr 26 '17 at 11:02
Thanks Sudipta.. We used a similar condition to check the ascii value of the characters and anything above a certain value was filtered out, – manu muraleedharan Jun 03 '17 at 11:16

score 0 · Answer 3 · edited May 23 '17 at 12:34

Given the unicode ranges for Japanese characters you can just replace the Japanese characters:

SELECT REGEXP_REPLACE(
         your_column,
         UNISTR( '[\3000-\303f\3040-\309f\30a0-\30ff\ff00-\ffef\4e00-\9faf]' ),
         NULL
       )
FROM   your_table

or ignore those rows:

SELECT *
FROM   your_table
WHERE  NOT REGEXP_LIKE(
         your_column,
         UNISTR( '[\3000-\303f\3040-\309f\30a0-\30ff\ff00-\ffef\4e00-\9faf]' )
       )

How can i filter out the Japanese values in an SQL Query?

3 Answers3