1

I have some problems with characters not allowed in XML-export. The request is simply to remove the invalid characters from the text to ensure corrext export/import of data.

so far, thanks to this post I have an sql statement which replaces one declared Unicode char:

UPDATE mytable SET myproperty = replace(myproperty, U&'\UNICODE', '');

It removes the specified Unicode from all occurrences inside the defined cell. But I would like to give a bunch of Unicode chars to be replaced, eg. 0x1 - 0xf Does anybody have an idea how to achieve that?

Community
  • 1
  • 1
Ruth
  • 856
  • 2
  • 13
  • 26

1 Answers1

4

Use TRANSLATE to remove multiple characters:

TRANSLATE(myproperty, 'list of characters', '');

Normally it's used to replace characters, but if the replacement string is empty it's removing them.

Edit:

Of course you can also use a regular expression, this allows ranges of characters:

regexp_replace(myproperty, '[\u0041-\u0043a]', '', 'g')

Adding the 'g' flag to replace all occurrences...

dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • thanks @dnoeth, that helps a little. do you know a way to set a range of codes? I tried that out but it does not work (eg. `U&'[\0041-\0043]'`) – Ruth Mar 17 '16 at 11:00
  • @RuthiRuth: There's no way to specifiy a range in `TRANSLATE`, but there's `REGEXP_REPLACE`. I modify my answer... – dnoeth Mar 17 '16 at 11:27
  • I tried that but then I get only the first occurrence in "myproperty" and not all occurences... – Ruth Mar 17 '16 at 12:59
  • I got it, I have to set the flag 'g' to replace it with all occurrences. if you could edit your answer @dnoeth I can mark it as right :) – Ruth Mar 17 '16 at 13:47
  • @RuthiRuth: Thanks, every DBMS has it's own rules regarding regex. Added it to my answer... – dnoeth Mar 17 '16 at 14:34
  • argh, I have to write a compatible code for database 8.4 now, and it always says: `WARNING: nonstandard use of escape in a string literal, HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.` so how can I use this method with 8.4? could not find anything to match unicode chars so far :( – Ruth Mar 21 '16 at 09:41
  • @RuthiRuth: Sorry, I don't have access to a 8.4 system, but according to the official documentation it's the same syntax for Unicode literals... – dnoeth Mar 21 '16 at 19:14
  • This for multiple ranges (\u3000-\u309F]|[\u30A0-\u30FF]|[\u4E00-\u9FFF]|[\uFF00-\uFFEF] – sparkle Dec 15 '21 at 22:04