1

I have a long string like following:

string='<span id="/yourid/12345" class="noname">lala1</span><span id="/yourid/34567" class="noname">lala2</span><span id="/yourid/39201" class="noname">lala3</span>'

The objective is to loop through each of the 'yourid' and echo the id 12345, 34567 and 39201 for further processing. How can this be achieve through bash shell?

Armitage
  • 33
  • 2
  • 3
    bash might be a bad choice. If you can, go with a language which has XML support such as Perl, Python, or TCL. – Hai Vu Jul 02 '13 at 02:39

3 Answers3

3

GNU grep:

grep -oP '(?<=/yourid/)\d+' <<< "$string"
12345
34567
39201
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
2

Use a real XML parser. For instance, if you have XMLStarlet installed...

while read -r id; do
  [[ $id ]] || continue
  printf '%s\n' "${id#/yourid/}"
done < <(xmlstarlet sel -m -t '//span[@id]' -v ./@id -n <<<"<root>${string}</root>")
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • +1 xmlstarlet may not be as likely to be installed, but it's convenient that you can use ad hoc xpath expressions on the command-line. AFAIK, xsltproc requires you to use a stylesheet file. – Bill Karwin Jul 02 '13 at 02:44
1

With Perl:

declare -a ids
ids=( $(perl -lne 'while(m!yourid/(\w+)!g){print $1}' <<< "$string") )
echo ${ids[@]}
perreal
  • 94,503
  • 21
  • 155
  • 181