0

I am querying from an Oracle 12c database and i need to filter out values which are in Japanese because the Java application receiving these values currently do not have the ability to display Japanese characters. For a quick demo, is there any way we can avoid the Japanese strings from the results?

  • by only accepting English characters ? `REGEXP_REPLACE` to replace any character which is not in English with `''` – Sudipta Mondal Apr 26 '17 at 10:10
  • Possible duplicate of [Unicode range for Japanese](http://stackoverflow.com/questions/19899554/unicode-range-for-japanese) – Shnugo Apr 26 '17 at 10:11
  • @Shnugo Not a duplicate - that just gives the unicode ranges for Japanese characters but does not show how to filter those values from a string in Oracle. – MT0 Apr 26 '17 at 10:53
  • You could set NLS_LANG parameter to character set which is supported by your application. Then Oracle will replace any invalid character by ¿ automatically. – Wernfried Domscheit Apr 26 '17 at 15:20

3 Answers3

0

I did not altered the NLS_LANG format to insert the Japanese character properly. Using regexp_like you can filter records which don't have alphanumeric data.

create table table1 (name varchar2(100));
insert into table1 (name) values ('ABC');
insert into table1 (name) values ('DEF');
insert into table1 (name) values ('GHI');
insert into table1 (name) values ('JKL');
insert into table1 (name) values ('GHI');
insert into table1 (name) values ('昨夜のコンサ');
insert into table1 (name) values ('昨夜のABC');

select * from table1;

select * from table1 where regexp_like (name,'^[0-9a-zA-Z]+$');

select nvl(regexp_replace(name,'[^0-9a-zA-Z'']',''),'blank') from table1;
Sandeep
  • 774
  • 3
  • 8
  • This will simply ignore all the records, which have Japanese characters, The OP only wants to filter out the Japanese characters. – Sudipta Mondal Apr 26 '17 at 10:23
  • @SudiptaMondal Thanks for pointing that out. I have now edited my answer. – Sandeep Apr 26 '17 at 10:50
  • This will ignore too many values including punctuation and characters in languages other than Japanese. The OP's question is only about filtering out Japanese characters (and not other languages). – MT0 Apr 26 '17 at 10:57
0

You will need to replace all Non English characters with NULL for all the columns you have in your select statement.

select regexp_replace (column_name,'[^\x80-\xFF]',NULL) from table_name;

This will replace anything which is not in English characters to NULL

Sudipta Mondal
  • 2,550
  • 1
  • 19
  • 20
  • This will match too many characters; the OP does not want to remove non-English characters, just Japanese characters. – MT0 Apr 26 '17 at 10:55
  • @MT0 wouldn't it be safe to assume, the Java application might have a restriction to only display English characters – Sudipta Mondal Apr 26 '17 at 11:01
  • You may assume that but that is not what the OP's question is asking. – MT0 Apr 26 '17 at 11:02
  • Thanks Sudipta.. We used a similar condition to check the ascii value of the characters and anything above a certain value was filtered out, – manu muraleedharan Jun 03 '17 at 11:16
0

Given the unicode ranges for Japanese characters you can just replace the Japanese characters:

SELECT REGEXP_REPLACE(
         your_column,
         UNISTR( '[\3000-\303f\3040-\309f\30a0-\30ff\ff00-\ffef\4e00-\9faf]' ),
         NULL
       )
FROM   your_table

or ignore those rows:

SELECT *
FROM   your_table
WHERE  NOT REGEXP_LIKE(
         your_column,
         UNISTR( '[\3000-\303f\3040-\309f\30a0-\30ff\ff00-\ffef\4e00-\9faf]' )
       )
Community
  • 1
  • 1
MT0
  • 143,790
  • 11
  • 59
  • 117