Bash Cut text before and after id

Question

I have line:

<doc lang="en" func="auth" binary="/dnsmgr" host="https://dns.test.com" theme="orion" stylesheet="login" features="b993e382360bcbb508601df300594747" notify=""><auth id="4b0b5cb2210b" level="16">4b0b5cb2210b</auth><tparams><out>xml</out><username>user191642</username><func>auth</func></tparams></doc>

I need to extract id from <auth id="4b0b5cb2210b" So my result must be: 4b0b5cb2210b

This is a result of the curl requests, so ID will be different next time. Please help me get only the id.

I have tried this to cut the line:

awk -F'level' '{print $1}' | sed 's/^\s*auth id=//' | grep 'auth id=' | sed 's/^.*: //'

and got:

<doc lang=en func=auth binary=/dnsmgr host=https://dns.test.com theme=orion stylesheet=login features=b993e382360bcbb508601df300594747 notify=><auth id=4b0b5cb2210b

I'm close enough to the result. Please help me cut the text before the exact id.

Does this answer your question? [How to grep for contents after pattern?](https://stackoverflow.com/questions/10358547/how-to-grep-for-contents-after-pattern) — Sundeep, Jul 02 '20 at 08:37

Sundeep · Answer 1 · 2020-07-02T08:49:48.953

With GNU grep

$ grep -oP '<auth id="\K[^"]+(?=")' ip.txt
4b0b5cb2210b

<auth id="\K match <auth id=" but don't use it for output
[^"]+ non " characters
(?=") to make sure there is " after the auth id value

With sed

$ sed -E 's/.*<auth id="([^"]+)/\1/' ip.txt
4b0b5cb2210b

.*<auth id=" match from start of line till <auth id="
([^"]+) capture auth id value
".* rest of the line
\1 use the captured text as replacement string

What went wrong with OP's attempt:

$ awk -F'level' '{print $1}' ip.txt 
<doc lang="en" func="auth" binary="/dnsmgr" host="https://dns.test.com" theme="orion" stylesheet="login" features="b993e382360bcbb508601df300594747" notify=""><auth id="4b0b5cb2210b"

The above command gives text before level. Piping it to sed 's/^\s*auth id=//' will not work because ^ anchors the match to start of line. Whereas the line starts with <doc. Changing ^\s* to .* will give:

$ awk -F'level' '{print $1}' ip.txt | sed 's/.*auth id=//'
"4b0b5cb2210b"

The . metacharacter matches any character. So, everything until auth id= will be now be removed. The quotes can then be deleted using another substitution.

$ awk -F'level' '{print $1}' ip.txt | sed 's/.*auth id=//; s/"//g'
4b0b5cb2210b

This is for explanation purpose only, simpler solution is at the start of the answer.

Bash Cut text before and after id

1 Answers1