0

I have line:

<doc lang="en" func="auth" binary="/dnsmgr" host="https://dns.test.com" theme="orion" stylesheet="login" features="b993e382360bcbb508601df300594747" notify=""><auth id="4b0b5cb2210b" level="16">4b0b5cb2210b</auth><tparams><out>xml</out><username>user191642</username><func>auth</func></tparams></doc>

I need to extract id from <auth id="4b0b5cb2210b" So my result must be: 4b0b5cb2210b

This is a result of the curl requests, so ID will be different next time. Please help me get only the id.

I have tried this to cut the line:

awk -F'level' '{print $1}' | sed 's/^\s*auth id=//' | grep 'auth id=' | sed 's/^.*: //'

and got:

<doc lang=en func=auth binary=/dnsmgr host=https://dns.test.com theme=orion stylesheet=login features=b993e382360bcbb508601df300594747 notify=><auth id=4b0b5cb2210b

I'm close enough to the result. Please help me cut the text before the exact id.

Fohroer
  • 77
  • 5
  • Does this answer your question? [How to grep for contents after pattern?](https://stackoverflow.com/questions/10358547/how-to-grep-for-contents-after-pattern) – Sundeep Jul 02 '20 at 08:37

1 Answers1

2

With GNU grep

$ grep -oP '<auth id="\K[^"]+(?=")' ip.txt
4b0b5cb2210b
  • <auth id="\K match <auth id=" but don't use it for output
  • [^"]+ non " characters
  • (?=") to make sure there is " after the auth id value

With sed

$ sed -E 's/.*<auth id="([^"]+)/\1/' ip.txt
4b0b5cb2210b
  • .*<auth id=" match from start of line till <auth id="
  • ([^"]+) capture auth id value
  • ".* rest of the line
  • \1 use the captured text as replacement string

What went wrong with OP's attempt:

$ awk -F'level' '{print $1}' ip.txt 
<doc lang="en" func="auth" binary="/dnsmgr" host="https://dns.test.com" theme="orion" stylesheet="login" features="b993e382360bcbb508601df300594747" notify=""><auth id="4b0b5cb2210b" 

The above command gives text before level. Piping it to sed 's/^\s*auth id=//' will not work because ^ anchors the match to start of line. Whereas the line starts with <doc. Changing ^\s* to .* will give:

$ awk -F'level' '{print $1}' ip.txt | sed 's/.*auth id=//'
"4b0b5cb2210b" 

The . metacharacter matches any character. So, everything until auth id= will be now be removed. The quotes can then be deleted using another substitution.

$ awk -F'level' '{print $1}' ip.txt | sed 's/.*auth id=//; s/"//g'
4b0b5cb2210b 

This is for explanation purpose only, simpler solution is at the start of the answer.

Sundeep
  • 23,246
  • 2
  • 28
  • 103