I am querying from an Oracle 12c database and i need to filter out values which are in Japanese because the Java application receiving these values currently do not have the ability to display Japanese characters. For a quick demo, is there any way we can avoid the Japanese strings from the results?
Asked
Active
Viewed 2,382 times
0
-
by only accepting English characters ? `REGEXP_REPLACE` to replace any character which is not in English with `''` – Sudipta Mondal Apr 26 '17 at 10:10
-
Possible duplicate of [Unicode range for Japanese](http://stackoverflow.com/questions/19899554/unicode-range-for-japanese) – Shnugo Apr 26 '17 at 10:11
-
@Shnugo Not a duplicate - that just gives the unicode ranges for Japanese characters but does not show how to filter those values from a string in Oracle. – MT0 Apr 26 '17 at 10:53
-
You could set NLS_LANG parameter to character set which is supported by your application. Then Oracle will replace any invalid character by ¿ automatically. – Wernfried Domscheit Apr 26 '17 at 15:20
3 Answers
0
I did not altered the NLS_LANG format to insert the Japanese character properly. Using regexp_like you can filter records which don't have alphanumeric data.
create table table1 (name varchar2(100));
insert into table1 (name) values ('ABC');
insert into table1 (name) values ('DEF');
insert into table1 (name) values ('GHI');
insert into table1 (name) values ('JKL');
insert into table1 (name) values ('GHI');
insert into table1 (name) values ('昨夜のコンサ');
insert into table1 (name) values ('昨夜のABC');
select * from table1;
select * from table1 where regexp_like (name,'^[0-9a-zA-Z]+$');
select nvl(regexp_replace(name,'[^0-9a-zA-Z'']',''),'blank') from table1;

Sandeep
- 774
- 3
- 8
-
This will simply ignore all the records, which have Japanese characters, The OP only wants to filter out the Japanese characters. – Sudipta Mondal Apr 26 '17 at 10:23
-
@SudiptaMondal Thanks for pointing that out. I have now edited my answer. – Sandeep Apr 26 '17 at 10:50
-
This will ignore too many values including punctuation and characters in languages other than Japanese. The OP's question is only about filtering out Japanese characters (and not other languages). – MT0 Apr 26 '17 at 10:57
0
You will need to replace all Non English characters with NULL for all the columns you have in your select statement.
select regexp_replace (column_name,'[^\x80-\xFF]',NULL) from table_name;
This will replace anything which is not in English characters to NULL

Sudipta Mondal
- 2,550
- 1
- 19
- 20
-
This will match too many characters; the OP does not want to remove non-English characters, just Japanese characters. – MT0 Apr 26 '17 at 10:55
-
@MT0 wouldn't it be safe to assume, the Java application might have a restriction to only display English characters – Sudipta Mondal Apr 26 '17 at 11:01
-
-
Thanks Sudipta.. We used a similar condition to check the ascii value of the characters and anything above a certain value was filtered out, – manu muraleedharan Jun 03 '17 at 11:16
0
Given the unicode ranges for Japanese characters you can just replace the Japanese characters:
SELECT REGEXP_REPLACE(
your_column,
UNISTR( '[\3000-\303f\3040-\309f\30a0-\30ff\ff00-\ffef\4e00-\9faf]' ),
NULL
)
FROM your_table
or ignore those rows:
SELECT *
FROM your_table
WHERE NOT REGEXP_LIKE(
your_column,
UNISTR( '[\3000-\303f\3040-\309f\30a0-\30ff\ff00-\ffef\4e00-\9faf]' )
)