I'm using Oracle REGEXP_REPLACE to remove all non standard ascii (special) characters from a CLOB freetext field a client uses to store data, some users copy and paste from Excel which causes issues when reading the text from the CLOB. I've managed to get rid of all special characters and maintain the layout for paragraphs etc.
The issue I'm having is when I compare the cleansed field to the original, after converting to char string of course, is it is returning rows that have no difference between them, I've pulled out both values and there is no difference in most cases. It's returning the whole dataset instead of just those that have been cleansed.I've run out of ideas
FYI a big part of maintaining the layout means I need to keep the carriage return CHR(13)
Below is the query
select *
from (
select incident, entity,
trim(to_char(textarea)) textarea,
trim(to_char(regexp_replace(textarea,'[^' || CHR(13) || CHR(32) || '-' || CHR(126) || ']',''))) regexp_text
from response
) tab
where tab.regexp_text <> tab.textarea
I've tried a number of different combinations with trimming whitespace and substring smaller strings etc. with no luck