1

I'm using this post as a reference for this question - How do I regex remove whitespace and newlines from a text, except for when they are in a json's string?

I having the following string in a java program:

"stuff\n blah\n--payload {'meh': 'kar\n'}"

I'm looking for a regex to replace the newline characters in the entire string except for the one's within the JSON string. The result I'm expecting is:

"stuff blah --payload {'meh': 'kar\n'}"

The regex referenced in that post works fine for most cases, but replaces the \n within the JSON string as well. The end result I get is:

"stuff blah --payload {'meh': 'kar'}"

I've been experimenting with the following set of regexes:

^("[^"]*(?:""[^"]*)*")(\n+)  // I expected this to be a combination of newline and newline not within double quotes

[\n\r]\s*  //Match new lines, and then could possibly negate it to be within double quotes?

But I still can't seem to get the use case where the newline character within a JSON value string won't be ignored. Is there a possible solution?

chrisrhyno2003
  • 3,906
  • 8
  • 53
  • 102
  • 9
    "Parsing" JSON with regex... won't work so well. You're in a bit of a bind unless you know for sure the JSON will be after `--payload`, in which case do this in two chunks: handle the bit before the JSON payload, and handle the bit after (which may be ignoring it), then smush the two chunks back together. – Dave Newton Aug 04 '20 at 16:12
  • 3
    In other words, you're making the problem harder than it probably is. – Dave Newton Aug 04 '20 at 16:12
  • 1
    Use a parser, not regex. It is the only way. – markspace Aug 04 '20 at 16:15
  • Why don't you just exclude the JSON first (e.g. by removing the String starting with a `{` and ending with a `}`) and afterwards remove the line breaks in the remaining String. As @DaveNewton wrote, you are overcomplicating things by handling this as one String. – T A Aug 04 '20 at 16:16
  • Good point. I could use a matcher to remove the JSON string part and then strip newlines out of the remaining string completely. – chrisrhyno2003 Aug 04 '20 at 16:17

2 Answers2

0

I believe you're over-complicating this in two ways:

  1. Using regex for anything involving JSON.
  2. Trying to solve for the entire string at once.

JSON

Regex + JSON, like Regex + HTML TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ, just don't mix.

Break the problem up

If the JSON is always at the end, and always delimited by a known string, you can:

  • Split the string at the last delimiter (--payload in your example).
  • Process the first string (strip the newlines).
  • Smush them back together.
  • Profit.
Dave Newton
  • 158,873
  • 26
  • 254
  • 302
0

This might help:

public static void main(String[] args) {
    String input = "stuff\n blah\n--payload {'meh': 'kar\n'}";
    // Wanted output: Output: "stuff blah --payload {'meh': 'kar\n'}"

    String regexPayload = "--payload\\s[^\\}]+\\}";
    Matcher matcherExtractPayload = Pattern.compile(regexPayload, Pattern.DOTALL).matcher(input);
    Matcher matcherReplaceWithTag = Pattern.compile(regexPayload).matcher(input);

    String tag = "#PAYLOAD#";
    String taggedPayload = "EMPTY";
    String payLoad = "NO_PAYLOAD_FOUND";
    if(matcherExtractPayload.find()) {
        payLoad = matcherExtractPayload.group();
        taggedPayload = matcherReplaceWithTag.replaceFirst(tag);
    }

    String removedNewline = Pattern.compile("\n").matcher(taggedPayload).replaceAll("");
    String restoredPayload = removedNewline.replaceFirst(tag, " " + payLoad);

    System.out.println(restoredPayload); // Output: "stuff blah --payload {'meh': 'kar\n'}"
}
DigitShifter
  • 801
  • 5
  • 12