0

I have below Json having both valid and invalid text. I want to extract the valid string from it. valid Json string starts with {""emp_id"" and ends at ""unit_id"":""true""}}

starting string is fixed however after the end of string ""unit_id"" we may get multiple key value pairs again like ""unit_id"":""true"",""ind_val"":"1",""active_flg"":""Y"" }} etc however it will always end with Two curly brackets }}

.915-0700",,,"42,795""34434",.915-0700",,,"42,795""34434",{""emp_id"":""212345"",""emp_request"":{""request_header"":{""dept_id"":""20182166008974"",""client_id"":""AP"",""medium_id"":""Web"",""country"":""US"",""request_time"":""10:59:42.719""},""mp_req"":{""contexts"":[""PO""],""user_id"":{""ref_id"":"""",""date_of_birth"":""2012-12-12"",""emp_objects"":[{""emp_number"":null,""emp_number_enc"":null,""emp_n1"":""18"",""emp_n2_enc"":null,""emp_n3"":null,""dept_enc"":null}],""char_enc"":null,""ttp"":""304""},""req_info"":{""dept_code"":""H2"",""address"":null,""grp_code"":""S000043K""}}},""additional_request"":{""tax"":"""",""unit_id"":""true""}}^""","2018-08-0607:59:42.915-0700",,,"42,795""34434","mgr_hir":"yes","34545",,,,, 

this command give below output

grep -o '{""emp_id[^}]*}' file.json 
{""emp_id"":""212345"",""emp_request"":{""request_header"":{""dept_id"":""20182166008974"",""client_id"":""AP"",""medium_id"":""Web"",""country"":""US"",""request_time"":""10:59:42.719""}

I was trying to specify end pattern as after ""unit_id"": it can have multiple chars/key-value pairs till it reaches }} however its not working since there is something missing in below command

grep -o '{""emp_id[^}]*""unit_id"":[a-z]}}$' file.json

Please suggest.

ashwini
  • 531
  • 5
  • 13
  • 28
  • `[a-z]` matches only a single character Do you really have double double quotes throughout? Then extracting that will still not produce valid JSON. – tripleee Jan 31 '19 at 05:02
  • after extraction I need to replace "" by " . so before that I just wanted to extract valid part – ashwini Jan 31 '19 at 05:10
  • So something like `sed 's/.*\({""emp_id[^}]*""unit_id"":[^}]*}}\).*/\1/;s/""/"/g'` maybe? Are there closing square brackets anywhere before the two? – tripleee Jan 31 '19 at 05:12
  • no closing square brackets .... i am getting it now with grep -o '{""emp_id*.*unit_id.*}}' file.json however there are some logs not having unit_id at the end in fact its missing then this will not work – ashwini Jan 31 '19 at 05:36
  • Then just grep from `((""emp_id""` until the next occurrence of `}}`? – tripleee Jan 31 '19 at 06:02
  • yes I tried that and it worked !! – ashwini Feb 01 '19 at 05:48

0 Answers0