0

I have the text file with very long lines and special symbols inside. Here is an example:

{"keyword1":["A123","D356"],"keyword2":"ENXXXXXXXXXXXXXX","keyword3":[{"name1":["R3123","L2356"],"keyword4":"text here","keyword5":"4LJ"},{"app":,"keyword6":"XX-XX-XX-XXX-XXX-Axy - Important text here","keyword7":"FBG","{[  ** text here.........}

Text in keyword2 is always starting with EN followed by 14 numbers Text in keyword6 is always starting in alphanumeric format XX-XX-XX-XXX-XXX-Axx, where X is 0 to 9, A is symbol A, and xx is 0 to 9, but my or may not be present. "Important text here" can contain any symbol including &, /, \ *. Keywords may not be unique, but they can appear in the text only after keyword7.

What i want to achieve is to take data from the keywords 2 and 6 and make a new file with three columns: separated with semicolon

ENXXXXXXXXXXXXXX;XX-XX-XX-XXX-XXX-Axy;Important text here

Tried awk and sed, but with questionable success due to so many special symbols around.

  • Please use a json parser if you see any way doing so: https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools – Ohumeronen Feb 12 '23 at 12:16
  • 1
    @[Ohumeronen](https://stackoverflow.com/users/2177047/ohumeronen) - Thanks for suggestion! I read the section "Why not use awk, sed, or grep?" in your link, but unfortunately my experience is very cursory knowledge of sed, awk and grep. That's why I'm asking for these tools. Anyway, if anyone could post another solution (using Linux) it would be OK. – HangInThere Feb 12 '23 at 16:09
  • Ok, that makes perfect sense then. As I see an answer was already posted. – Ohumeronen Feb 12 '23 at 20:09

1 Answers1

0
 echo '{"keyword1":["A123","D356"],"keyword2":"ENXXXXXXXXXXXXXX","keyword3":[{"name1":["R3123","L2356"],"keyword4":"text here","keyword5":"4LJ"},{"app":,"keyword6":"XX-XX-XX-XXX-XXX-Axy - Important text here","keyword7":"FBG","{[  ** text here.........}' | 
{m,g,n}awk NF=NF OFS=' )\n ( ' \
           FS='^.+"keyword2":"|","keyword(3".+"keyword6":"|7".+$)| - '
 )
 ( ENXXXXXXXXXXXXXX )
 ( XX-XX-XX-XXX-XXX-Axy )
 ( Important text here )
 ( 

It should be trivial from here.

gawk 'gsub("^\n+|\n+$",_, $!(NF = NF))^_' OFS='\n' \
        FS='^.+"keyword2":"|","keyword(3".+"keyword6":"|7".+$)| - '
ENXXXXXXXXXXXXXX
XX-XX-XX-XXX-XXX-Axy
Important text here
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11