Use grep for extract text between two word

Question

I hava a file:

{
   "test1": [
        "test_a",
        "test_b",
        "test_c"
   ]
}

I am trying to extract the text that exists between "test1": [ and ] I'm trying this command:

cat test | grep -o -P '(?<=test": [).*(?=])'

But it's not work. An idea?

Thanks !

score 3 · Answer 1 · answered Jan 23 '18 at 22:15

3

Simply with jq tool:

jq -r '.test1[]' testfile

The output:

test_a
test_b
test_c

answered Jan 23 '18 at 22:15

RomanPerekhrest

88,541
4
65
105

Upvoting for jq... This is the right tool for this job. – erip Jan 23 '18 at 22:17

Mark Maurice Williams · Answer 2 · 2018-01-23T21:18:22.780

2

grep is not the best tool for this particular job, but if you must use it, this works:

cat test | grep -Pzo '(?s)(?<=test1\": \[)[^\]]*(?=\])'

With the input above you specified, the output of this command is:

    "test_a",
    "test_b",
    "test_c"

The -z option allows a pattern to match across multiple lines, in this case. The (?s) flag enables the [^\]] pattern to also match newline characters.

The jq utility is designed for what you're trying to do:

cat test | jq '.["test"]'

edited Jan 23 '18 at 21:18

answered Jan 23 '18 at 17:56

Mark Maurice Williams

217
2
5

very nice solution. I did not know the `-z` option. To improve the post, could you update the formatting and maybe show the output. – kvantour Jan 23 '18 at 18:18

kvantour · Answer 3 · 2018-01-24T08:53:55.357

Update: unexpectedly grep is sadly able to grep over multiple lines. See some other answers. And jq is realy tje right tool for the job.

Nonetheless, here is an awk solution :

$ awk '/]/{p=0}p{print}/test1/{p=1}' test 
    "test_a",
    "test_b",
    "test_c"

Or a bit more generic

$ awk 'BEGIN{RS="\"test1\": \\[\n|\n[[:blank:]]*\\]"}(RT~/]/){print}' test
    "test_a",
    "test_b",
    "test_c"

The first solution searches for test1 and sets a marker to print (p=1). If it finds a ] it will set the print marker to zero.

The second solution defines a record separator to be or \"test1\": \\[\n or \n[[:blank:]]*\\]. It will check the found record separator, if this is the correct one, it will print.

I know, I just kept it in the style of the OP. But you are correct, I'll update — kvantour, Jan 23 '18 at 17:54

stevesliva · Answer 4 · 2018-01-23T18:14:35.210

sed -n '/"test1": \[/,/\]/{//!p}' test

sed -n only print lines from pattern buffer (modified input stream) when p command is used.
From pattern /"test1": \[/ to pattern /\]/ using the /START/,/END/{ ... } syntax:
//!p print the line only if not matching the previous match

The generic form is sed -n '/START/,/END/{//!p}' input-file to omit START and END lines. Or simply sed -n '/START/,/END/p' input-file if you want them.

Use grep for extract text between two word

4 Answers4