You need to arrange to ignore whitespace when it appears within quotes,. So as suggested by one of the commenters:
\s+ | ( " (?: [^"\\] | \\ . ) * " ) // White-space inserted for readability
Match java whitespace or a double-quoted string where a string consists of "
followed by any non-escape, non-quote or an escape + plus any character, then a final "
. This way, whitespaces inside strings are not matched.
and replace with $1 if $1 is not null.
Pattern clean = Pattern.compile(" \\s+ | ( \" (?: [^\"\\\\] | \\\\ . ) * \" ) ", Pattern.COMMENTS | Pattern.DOTALL);
StringBuffer sb = new StringBuffer();
Matcher m = clean.matcher( json );
while (m.find()) {
m.appendReplacement(sb, "" );
// Don't put m.group(1) in the appendReplacement because if it happens to contain $1 or $2 you'll get an error.
if ( m.group(1) != null )
sb.append( m.group(1) );
}
m.appendTail(sb);
String cleanJson = sb.toString();
This is totally off the top of my head but I'm pretty sure it's close to what you want.
Edit: I've just got access to a Java IDE and tried out my solution. I had made a couple of mistakes with my code including using \.
instead of .
in the Pattern. So I have fixed that up and run it on a variation of your sample:
db.insert( {
_id:3,
cost:{_0:11},
description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"
});
The code:
String json = "db.insert( {\n" +
" _id:3,\n" +
" cost:{_0:11},\n" +
" description:\"This is a \\\"description\\\" with an embedded newline: \\\"\\n\\\".\\nCool, isn\\'t it?\"\n" +
"});";
// insert above code
System.out.println(cleanJson);
This produces:
db.insert({_id:3,cost:{_0:11},description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"});
which is the same json expression with all whitespace removed outside quoted strings and whitespace and newlines retained inside quoted strings.