-2

I'm trying to extract a tag value of an HTML node that I already have in a variable. I'm currently using Zsh but I'm trying to make it work in Bash as well.

The current variable has the value:

<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>

and I would like to get the value of data-count (in this case 0, but could be any length integer).

I have tried using cut, sed and the variables expansion as explained in this question but I haven't managed to adapt the regexs, or maybe it has to be done differently for Zsh.

CarlosAS
  • 654
  • 2
  • 10
  • 31

3 Answers3

2

Could you please try following.

awk 'match($0,/data-count=[^ ]*/){print substr($0,RSTART+12,RLENGTH-13)}' Input_file

Explanation: Using match function of awk to match regex data-count=[^ ]* means match everything from data-count till a space comes, if this regex is TRUE(a match is found) then out of the box variables RSTART and RLENGTH will be set. Later I am printing current line's sub-string as per these variables values to get only value of data-count.


With sed could you please try following.

sed 's/.*data-count=\"\([^"]*\).*/\1/'  Input_file

Explanation: Using sed's capability of group referencing and saving regex value in first group after data-count=\" which is its length, then since using s(substitution) with sed so mentioning 1 will replace all with \1(which is matched regex value in temporary memory, group referencing).

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2

There is no reason why sed would not work in this situation. For your specific case, I would do something like this:

sed 's/.*data-count="\([0-9]*\)".*/\1/g' file_name.txt

Basically, it just states that sed is looking for the a pattern that contains data-count=, then saves everything within the paranthesis \(...\) into \1, which is subsequently printed in place of the match (full line due to the .*)

Jason K Lai
  • 1,500
  • 5
  • 15
0

As was said before, to be on the safe side and handle any syntactically valid HTML tag, a parser would be strongly advised. But if you know in advance, what the general format of your HTML element will look like, the following hack might come handy:

Assume that your variable is called "html"

html='<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>'

First adapt it a bit:

htmlx="tag ${html%??}"

This will add the string tag in front and remove the final />

Now make an associative array:

declare -A fields
fields=( ${=$(tr = ' ' <<<$htmlx)} )

The tr turns the equal sign into a space and the ${= handles word splitting. You can now access the values of your attributes by, say,

echo $fields[data-count]

Note that this still has the surrounding double quotes. Yuo can easily remove them by

echo ${${fields[data-count]%?}#?}

Of course, once you do this hack, you have access to all attributes in the same way.

user1934428
  • 19,864
  • 7
  • 42
  • 87