1

I have a system.log file that looks like.

[2019-12-20 09:06:40] main.INFO: Update Product Attributes [] []
[2019-12-20 09:18:56] main.INFO: Customer Id: . Param: {"store":101,"search":"soap"} [] []
[2019-12-20 09:19:32] main.INFO: Update Product Attributes [] []
[2019-12-20 09:20:34] main.INFO: Customer Id: . Param: {"store":101,"search":"ea"} [] []
[2019-12-20 09:23:29] main.INFO: Customer Id: . Param: {"store":101,"search":"C2"} [] []
[2019-12-20 09:23:31] main.INFO: Update Product Attributes [] []
[2019-12-20 09:23:43] main.INFO: Customer Id: . Param: {"store":101,"search":"spaghetti"} [] []
[2019-12-20 09:24:06] main.INFO: Customer Id: . Param: {"store":101,"search":"Ea"} [] []

Now I want to split like this to get the date and value of search in my log.

2019-12-20 "soap"
2019-12-20 "ea"
2019-12-20 "C2"
2019-12-20 "spaghetti"
2019-12-20 "Ea"

So far I've tried this:

awk -F '] main.INFO: Customer Id: . Param: {"store"' '{ if ( $2 ~ /search/ ) { print $1 $2} }' system.log

but they return like this, it can't split to the other layer.

[2019-12-20 10:08:04:101,"search":"ea"} [] []
[2019-12-20 10:08:35:101,"search":"ea"} [] []
anubhava
  • 761,203
  • 64
  • 569
  • 643

6 Answers6

7

Could you please try following, written and tested with shown samples in GNU awk.

awk -v s1="\"" '
/Customer Id/{
  match($0,/Param: {.*}/)
  val=substr($0,RSTART,RLENGTH)
  gsub(/.*:"|"}$/,"",val)
  sub(/\[/,"",$1)
  print $1,s1 val s1
  val=""
}'  Input_file

Explanation: Adding detailed explanation for above.

awk -v s1="\"" '                     ##Starting awk program from here and setting variable s1 which has " value in it.
/Customer Id/{                       ##Checking string Customer Id is present in current line then do following.
  match($0,/Param: {.*}/)            ##Using match to match regex Param: till } then do following.
  val=substr($0,RSTART,RLENGTH)      ##Creating val whose value is sub string of current line from RSTART to RLENGTH here.
  gsub(/.*:"|"}$/,"",val)            ##Globally substituting everything till :" and "} at last of val with NULL.
  sub(/\[/,"",$1)                    ##Substituting [  in first column here.
  print $1,s1 val s1                 ##Printing first column s1 val and s1 here as per OP expected output.
  val=""                             ##Nullifying val here.
}' Input_file                        ##Mentioning Input_file name here.


2nd solution: Adding 1 more solution here.

awk -v s1="\"" '
/Customer Id:/{
  match($0,/\[[0-9]{4}-[0-9]{2}-[0-9]{2}/)
  dat=substr($0,RSTART+1,RLENGTH-1)
  match($0,/Param: {.*}/)
  val=substr($0,RSTART,RLENGTH)
  gsub(/.*:"|"}$/,"",val)
  print dat,s1 val s1
  dat=val=""
}
'  Input_file

Explanation: Adding detailed explanation for above.

awk -v s1="\"" '                                     ##Starting awk program from here and setting s1 as value " here.
/Customer Id:/{                                      ##Searching string Customer Id: in current line here.
  match($0,/\[[0-9]{4}-[0-9]{2}-[0-9]{2}/)           ##Using match function of awk and using regex here for current line to get value of date here.
  dat=substr($0,RSTART+1,RLENGTH-1)                  ##Creating dat variable and having sub string value in it for current line.
  match($0,/Param: {.*}/)                            ##Using match to match regex Param: { till } here.
  val=substr($0,RSTART,RLENGTH)                      ##Creating val which has sub string of previous used match function here.
  gsub(/.*:"|"}$/,"",val)                            ##Globally substituting till :" OR "} in last of val here with NULL.
  print dat,s1 val s1                                ##Printing dat s1 val and s1 here.
  dat=val=""                                         ##Nullifying dat and val here to avoid conflict of variable values here.
}
' Input_file                                         ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
3

Keeping it simple:

$ awk '
match($(NF-2),/\"[^"]*\"\}/) {
    print substr($1,2),substr($(NF-2),RSTART,RLENGTH-1)
}' file

Output:

2019-12-20 "soap"
2019-12-20 "ea"
2019-12-20 "C2"
2019-12-20 "spaghetti"
2019-12-20 "Ea"

Explained:

If the antepenultimate space-separated string has a substring "..."}, print the first space-separated string starting from the second character (excluding the first character [) and the above-mentioned substring excluding the last character }.

James Brown
  • 36,089
  • 7
  • 43
  • 59
1

You may use this gnu awk with FPAT:

awk -v FPAT='\\[[^]]+]|{[^}]+}' '
/main\.INFO: / && $2 ~ /"search":/ {
    gsub(/^\[| .*$/, "", $1)
    gsub(/^.*:|}$/, "", $2)
    print $1, $2 
}' file
2019-12-20 "soap"
2019-12-20 "ea"
2019-12-20 "C2"
2019-12-20 "spaghetti"
2019-12-20 "Ea"
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Just use perl like in https://stackoverflow.com/a/2957781/1921546.

perl -n -e '/^\[([^ ]*).*search":"((?:[^"]|\\.)*)"/ && print "$1 $2\n"'

Explanation of Regular Expression used at https://regexr.com/57nhk

pii_ke
  • 2,811
  • 2
  • 20
  • 30
0

With sed

$ sed -nE 's/^\[([^ ]+).*"search":("[^"]+").*/\1 \2/p' ip.txt
2019-12-20 "soap"
2019-12-20 "ea"
2019-12-20 "C2"
2019-12-20 "spaghetti"
2019-12-20 "Ea"
  • -n turn off auto print
  • -E enable ERE
  • ^\[ match the startin [
  • ([^ ]+) capture the date
  • .*"search": match till "search":
  • ("[^"]+") capture the value of search
  • .* rest of the line
  • \1 \2 text matched by the capture groups separated by space
  • p print only if substitution succeeds
Sundeep
  • 23,246
  • 2
  • 28
  • 103
0

i think i can simplify it to

gawk/mawk/mawk2 'BEGIN { FS = "([}]|search\"[:])"; OFS = " ";

    } (NF>1) { print substr($1, 2, index($1, OFS)-1), $2; }'
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11