0

I want to remove all comment inside a JSON object except the one that are inside a string. For example:

{
//Remove this comment
    "Command": "storeSystemConfig",
    "SystemId": "1234", //Remove this comment

        /*Remove this and the empty line above and below*/

/*This can be removed but not what behind here =>*/ "TestParam": "Hello",
    "TestString": "Do not revome this comment /*don not remove*/ and also this one: //Test comment"
}

I now use the following regular expression:

#(\\*([^*]|[\r\n]|(\*+([^*\/]|[\r\n])))*\*+\/)|([\s\t]\/.*)|(^\/.*)#

But unfortunately this expression removes also the comment inside the ‘TestString’ parameter. Here you can see how this expression handle the JSON data: https://regex101.com/r/65VL8v/1 and here my PHP source in a working environment: https://ideone.com/F4v20p

Micket
  • 17
  • 2
  • As much as I love to solve the entire problem in one line of regex, even I have to admit this is tricky. Are you unable to use multiple regex substitutions for some reason? It would be much simpler to remove different types of comments in different regex subs. – Robo Mop Sep 17 '20 at 09:06
  • If it can be done by multiple regex lines or other PHP code, this would also be fine.So to answer you question, there is no particular reason to do it with only one regex line. – Micket Sep 17 '20 at 09:31
  • I fixed the single line comment after a multi line comment problem, hope it works now! – Robo Mop Sep 18 '20 at 11:00
  • https://stackoverflow.com/q/52226541/438992, https://stackoverflow.com/q/19910002/438992 – Dave Newton Sep 18 '20 at 16:58

1 Answers1

1

Here's my attempt:

<?php
 
$json_origen = <<<'JSON'
{
//Remove this comment
    "Command": "storeSystemConfig", /*1234*/
    "SystemId": "1234", //Remove this comment
 
        /*Remove this and the 
        empty line above and below*/
 
/*This can be removed but not what behind here =>*/ "TestParam": "Hello",
    "TestString": "DNR this comment /*don not remove*/ and also this one: //Test comment" /*4321*/ //1234
}
JSON;
 
//Remove lines with only single line comments
$json = preg_replace("/[\n\r]\s*\/\/.*/", "", $json_origen);
//Remove all lines with only multi line comments
$json = preg_replace("/(?<=[\n\r])\s*\/\*(.[\n\r]?)*?\*\/\s*?/", "", $json);
//Remove lines multi line comments at the end
$json = preg_replace("/(\".+?(?<!\\\\)\"\s*,?)\s*\/\*(.[\n\r]?)*?\*\/\s*?/", "\\1", $json);
//Remove comment at the end of a line
$json = preg_replace("/(\".+?(?<!\\\\)\"\s*,?)\s*\/\/.*?(?=[\n\r])/", '\\1', $json);
//Remove empty lines
$json = preg_replace('/\n\s*\n/', "\n", $json);
 
echo($json);
 
?>

There's also the issue of multi line comments after a normal JSON statement but I have to write my uni exams now lol, I'll update this answer for it soon. For the sample input though, this should work.

Lemme know if there are any other extraneous situations that might occur in your JSON.


EDIT 1: Solved a possible problem where a value could contain double quotes, using the negative lookbehind (?<!\\\\), so escaped double quotes don't count

EDIT 2: Fixed the multi-line-comment-after-normal-json-statements problem I talked about.

EDIT 3: I provided the answer but not the detailed solution, so the concepts I used here are positive and negative lookbehinds and lookaheads. Also I have a habit of using [\n\r] instead of just \n because other problems might occur lol

EDIT 4: There was an issue where a single line comment after a multi line comment is not removed if they're both in the same line. Fixed that by simply changing the order of regex removals.

EDIT 5: Fixed the multi-line comment after json statement issue, just needed to check for a possible comma after the statement

Robo Mop
  • 3,485
  • 1
  • 10
  • 23
  • Thank you for multiple regex this solution. For the example is works fine, but I found out that multiline comment at the end where not removed. See: https://ideone.com/91t5aF – Micket Sep 17 '20 at 11:34
  • Sorry, you already mentioned that. I mean also single line command following bij multi-line comments. – Micket Sep 17 '20 at 11:42
  • Oh yeah that's an issue, I should be able to fix that in like 10 minutes or so, sorry for any inconvenience! – Robo Mop Sep 18 '20 at 10:56
  • Don't be sorry, I very pleased with the solution you already come op with, but to make it perfect, please find a solution to remove multi-line comments direct after JSON statements. (See line 6 in this example https://ideone.com/Zx3wdy ) The comment "/*1234*/" is still slipping through. – Micket Sep 18 '20 at 11:28
  • I edited the answer again, it should work now (I hope) – Robo Mop Sep 18 '20 at 11:34
  • Excellent, this one passed the test completely. Thanks you very much. :) – Micket Sep 18 '20 at 11:47
  • HELL YES, I'm glad I could help :) Also, if it solved your problem completely, consider upvoting it and marking it as correct - it would help a lot – Robo Mop Sep 18 '20 at 11:50
  • Now I use your solution in practice, I found a single line command that is still comming through. It happens when there is single line comment after an array. (See line 6 in https://ideone.com/xrOyIq) Hopefully you have a fix :) – Micket Sep 24 '20 at 12:35
  • Also a multi line skips again. (See: https://ideone.com/m6cNhh ) It seem I cheered too soon. – Micket Sep 24 '20 at 12:49
  • I you have time could you please check these issues. – Micket Oct 02 '20 at 06:48