1

I have a file in which some lines contain a json object on a single line, and I want to extract the value of the window_indicator property.

A normal regular expression is: "window_indicator":\s*([\-\d\.]+) in which I want the value of the fist match group. Here it is working perfectly well: https://regex101.com/r/w9Iuch/1

I've settled on sed because it seems that grep has to print the whole line and can't limit to the match group value, and perl is overkill.

Unfortunately, sed isn't actually capable of doing this, is it?

# sed 's/("window_indicator:)/\1/' in.txt
sed: -e expression #1, char 26: invalid reference \1 on `s' command's RHS

# sed  -E 's/("window_indicator":)/\1/p' in.txt
prints out every line of the file

# sed  -rn 's/("window_indicator":)/\1/p' in.txt
prints the whole line

# sed  -rn 's/("window_indicator":)/\1/' in.txt
nothing
Richard Barraclough
  • 2,625
  • 3
  • 36
  • 54

3 Answers3

2

With sed, you need to match the whole line, capture what you need, replace the whole match with Group 1 placeholder, and make sure you suppress the default line output and only print the new text after successful substitution:

sed -nE 's/.*"window_indicator":[[:space:]]*([-0-9.]+).*/\1/p' in.txt

If the first match is to be retrieved, add q to quit:

sed -nE 's/.*"window_indicator":[[:space:]]*([-0-9.]+).*/\1/p;q' in.txt

Note that \d is not supported in POSIX regex, it is replaced with 0-9 range in the bracket expression here.

Details

  • n - suppress default line output
  • E - enables POSIX ERE flavor
  • .*"window_indicator":[[:space:]]*([-0-9.]+).* - finds
    • .* - any text
    • "window_indicator": - a fixed string
    • [[:space:]]* - zero or more whitespaces (GNU sed supports \s, too)
    • ([-0-9.]+) - Group 1: one or more digits, - or .
    • .* - any text
  • \1 - replaces with Group 1 value
  • p - prints the result upon successful replacement
  • q - quits processing the stream.

With GNU grep, it is even easier:

grep -oP '"window_indicator":\s*\K[-\d.]+' in.txt

To get the first match,

grep -oP '"window_indicator":\s*\K[-\d.]+' in.txt | head -1

Here,

  • o - outputs matched texts only
  • P - enables the PCRE regex engine
  • "window_indicator":\s*\K[-\d.]+ - matches
  • "window_indicator": - a fixed string
  • \s* - zero or more whitespaces
  • \K - removes the text matched so far from the match value
  • [-\d.]+ - matches one or more -, . or digits.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

1st solution: With your shown samples please try following awk code. Though its always advised to use json parsers like: jq. Simple explanation would be, using match function of awk here, where using regex "window_indicator":[0-9]+} in it to match needed value. If regex is successfully matched then creating variable val which has sub-string of matched regex in current line. Then substituting "window_indicator": and } with NULL in val and printing val which will give needed value.

awk '
match($0,/"window_indicator":[0-9]+}/){
   val=substr($0,RSTART,RLENGTH)
   gsub(/"window_indicator":|}/,"",val)
   print val
}
'  Input_file


2nd solution: Using GNU grep where using positive look ahead and positive look behind mechanism and getting the expected output as per requirement.

grep -oP '(?<="window_indicator":)\d+(?=})' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Using sed

$ sed -E 's/.*window_indicator":([0-9]+).*/\1/' input_file
0

Using grep

$ grep -Po '.*window_indicator":\K\d+' input_file
0

Using awk

$ awk '{match($0,/.*window_indicator":([0-9]+)/,arr);print arr[1]}' input_file
0
HatLess
  • 10,622
  • 5
  • 14
  • 32