0

I have this kind of text, where I want to hide only values of some fields: l1 and x2. Here is example:

{
    "info":
    {
        "l1": 77,
        "x2": 77,
    },
    "user": "2323",
    "id": "xxxx",
    "time": 1679955931845,
    "msgType": "oyui"
}

I have come up with perfect regex which is working fine as "regex": (?<=(l1|x2)":)(.*?)(?=,) But now I want to use it in Linux with sed, which seems to be way too complex. At the end of day I made it work in two sed statements, but now I cannot find place for myself, because of not knowing how it can be done within one regex with `sed.

UPDATE

There are good answers if somebody would stop upon such issue. However, in my case, I specifically need to use sed statement, because this is the input required for configuration in other services (in my case Splunk and Field Filtering option with sed https://docs.splunk.com/Documentation/Splunk/9.0.4/Security/setfieldfiltering)

GensaGames
  • 5,538
  • 4
  • 24
  • 53
  • 4
    Don't parse JSON with sed, use `jq` – Gilles Quénot Mar 29 '23 at 02:30
  • Okay. The sed statement is actually incoming parameter for other services, in our case Splunk in field filtering. – GensaGames Mar 29 '23 at 02:33
  • Please post valid JSON. – Cyrus Mar 29 '23 at 05:31
  • 1
    The regexp `(?<=(l1|x2)":)(.*?)(?=,)` will not work in any POSIX tools as it's not valid BRE or ERE syntax so don't try to use it in sed or awk or any grep except possibly GNU grep with it's non-standard `-P` option. You may want to clarify in your question what `The sed statement is actually incoming parameter for other services, in our case Splunk in field filtering. ` means (I personally have no idea and I suspect others won't either). – Ed Morton Mar 29 '23 at 14:31
  • `sed '-e s/"[lx][12].*//g' inputfile`. Where inputfile contains this string, which does not seem to be valid JSON. – Luuk Mar 29 '23 at 18:43
  • As part of role-based configuration, the sed expression is required: https://docs.splunk.com/Documentation/Splunk/9.0.4/Security/setfieldfiltering – GensaGames Mar 30 '23 at 06:15
  • I have updated the question as well, why I'm specifically need to use `sed`. – GensaGames Mar 30 '23 at 06:17
  • The documentation you link to specifically talks about PCRE regular expressions, which are not generally supported by `sed`; the "P" stands for "Perl". – tripleee Mar 30 '23 at 08:18

2 Answers2

2

With , first fix you invalid JSON (i.e. remove the comma from the end of "x2": 77,), because jq failed with explicit error:

parse error: Expected another key-value pair at line 6, column 5

You have a comma to remove.

$ jq 'del(.info.l1, .info.x2)' file
{
  "info": {},
  "user": "2323",
  "id": "xxxx",
  "time": 1679955931845,
  "msgType": "oyui"
}

Or:

$ jq '(.info.l1, .info.x2)=""' file
{
  "info": {
    "l1": "",
    "x2": ""
  },
  "user": "2323",
  "id": "xxxx",
  "time": 1679955931845,
  "msgType": "oyui"
}
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
1

sed does not support the dialect you are trying to use. But Perl does.

perl -ne 'if (m/(?<=(l1|x2)":)(.*?)(?=,)/) { print "$1: $2\n" }'

Splunk basically borrows its regex engine from Perl (or PCRE?) so it should be convenient and natural to go back and forth between Perl and Splunk (though I should think you would never want to go back if you manage to leave ...)

Perl has some superficial similarities with sed, so you can say things like

perl -pe 's%(?<=foo)bar(?=baz)%quux%g'

which should be reasonably transparent if you are familiar with sed. There's even a tool s2p which automatically translates sed scripts to Perl scripts.

Parenthetically, many Splunk patterns seem to use named groups; you can use the built-in hash %+ in Perl to access these. 1

perl -ne 'if (m/(?<=(?P<thing>l1|x2)":)(?P<value>.*?)(?=,)/) { print "$+{thing}: $+{value}\n" }'

Perhaps see also Why are there so many different regular expression dialects?

If you genuinely need to use sed specifically, you need to refactor your regular expression to a BRE or at least an ERE - the latter is feasible if your sed has a (non-standard, but common) -r or -E option;

sed -nE 's/.*(l1|x2)":([^,]*),.*/"\1": "\2"/p'

This isn't exactly equivalent, obviously; the lookarounds have no real equivalent in traditional regex, so I just converted them to regular matches; and [^,]* isn't at all the same as .*? but in this case I'm guessing it's what you actually mean. Without seeing your actual data, it's hard to tell, but I can't imagine a scenario where the non-greedy regex would do something different. (More generally, [^,]* cannot match a comma, whereas .*? before a comma could still match a comma if that will allow the overall regex to reach a match.)

Without more information about what exactly you are hoping the parenthesized groups should do, this can obviously only be just a hint for how to actually solve your problem.

The corresponding POSIX BRE regex would have backslashes before each (, |, or ).


1 The hash is named %+ but an individual hash value is accessed like $+{"key"}. The mnemonic is that % is a sigil for the entire hash and $ is the sigil for a scalar such as an individual value out of the hash.

Many people are critical of Perl's "arcane" syntax but they clearly haven't seen Splunk's.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • That's good answer and it would help people who are looking to achieve such task in general, however in my case I specifically need to use `sed` statement for external service. I have also updated information about that in the question. – GensaGames Mar 30 '23 at 06:20
  • 1
    I added a `sed` solution but your question is somewhat vague on what exactly you want `sed` to do here. – tripleee Mar 30 '23 at 08:17
  • Kudos for the effort and explanation. – GensaGames Mar 30 '23 at 18:34