-1

Some JSON:

{"theSecret":"flhiusdh4543sdfasf34sdf+fsfs/sdf43454=","foo":{"bar":41,"something":"hello world","else":"hi","date":20230101,"qux":0,"digest":"sfhusdf234jkkjuiui23kjkj/SFDF34SDSFDS="}}

Prettified:

{
  "theSecret": "flhiusdh4543sdfasf34sdf+fsfs/sdf43454=",
  "foo": {
    "bar": 41,
    "something": "hello world",
    "else": "hi",
    "date": 20230101,
    "qux": 0,
    "digest": "sfhusdf234jkkjuiui23kjkj/SFDF34SDSFDS="
  }
}

I want the value of the theSecret key.

I tried this:

$ echo "$json" | sed -nE 's/.*theSecret":"(.*)".*/\1/p'    # (.*?) doesn't work

Which gives:

flhiusdh4543................4SDSFDS=

i.e. from theSecret until the end of digest. That's because sed lacks non-greedy quantifiers, so .*? doesn't work.

How can I do this?

(This is in an alpine container, which has sed, grep and awk. jq and perl are unavailable.)

Abdul Aziz Barkat
  • 19,475
  • 3
  • 20
  • 33
lonix
  • 14,255
  • 23
  • 85
  • 176
  • Don't use regex to process JSON. Use a tool like `jq` that knows how to parse it. – Barmar Jul 01 '23 at 04:06
  • @Barmar True. But this is in a container without jq. – lonix Jul 01 '23 at 04:07
  • If it lacked `?` then it wouldn't match at all. – Barmar Jul 01 '23 at 04:07
  • @Barmar I read in other questions that sed lacks the non-greedy quantifier `.*?`. From my test above that seems to be the case. Have I misunderstood, or made a mistake? – lonix Jul 01 '23 at 04:10
  • Instead of non-greedy, use `[^"]*` – Barmar Jul 01 '23 at 04:11
  • @Barmar "If it lacked ? then it wouldn't match at all." - no. `sed` lacks support of lazy quantifiers, but their presence doesn't break matching: it simple disregards question mark after quantifier, and proceeds with greedy one. Consider this [demo](https://ideone.com/6wljQQ) – markalex Jul 01 '23 at 07:10
  • does your grep understand `-P` and `-o`? If so and assuming the value is base64-encoded, `grep -Po '"theSecret":"\K[^"]*' – jhnc Jul 01 '23 at 15:42
  • with sed: `sed -n '/.*"theSecret":"/{s///;s/".*//p;}' – jhnc Jul 01 '23 at 15:45
  • @markalex It's very weird that it accepts them and ignores them, but I agree that it's what the behavior shows. – Barmar Jul 01 '23 at 19:03
  • 1
    If you feel the duplicate closure of a question isn't correct please edit the question to **explain why the duplicate doesn't apply**. Don't add in comments about the close voter, etc. If you feel there is some problematic pattern with a particular user please **flag for moderator attention**. If you feel there is some discussion needed in the community please take it to [meta] – Abdul Aziz Barkat Jul 24 '23 at 07:46
  • 1
    If you are editing the question to contest the dupe closure the **only** thing you should add is the explanation of why the target doesn't answer your question (or some clarification / information actually relevant to the question). All of the other commentary you added to your question does not belong there. Again as I mentioned if you want a discussion take it to [meta]. – Abdul Aziz Barkat Jul 24 '23 at 12:07

1 Answers1

1

Instead of using a non-greedy quantifier, use a pattern that doesn't match the terminating ".

$ echo "$json" | sed -nE 's/.*theSecret":"([^"]*)".*/\1/p'    # (.*?) doesn't work
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    `[^"]*` matches the longest possible sequence of non-quote characters, which necessarily has to stop before the `"` character. That character is then matched by the `"` after the capture group. – Kaz Jul 01 '23 at 05:44
  • for this particular case it looks like the value is base64-encoded but, in general, this regex is broken unless you extend it to handle escaped quotes: `'{"theSecret":"mylovely\"secret\""}` eg. `.*theSecret":"(([^\\"]|\\")*)".*` – jhnc Jul 01 '23 at 15:37
  • Trying to use regular expressions to parse arbitrary JSON is hopeless. It's only feasible when you know that the data is restricted like this. – Barmar Jul 01 '23 at 19:30