Shell Script Retrieve Data from html tag

Question

I want to get value inside </em>4,519</a> tag via shell script anyone please help how can do that?

id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>

Please [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858). I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). — Cyrus, Sep 15 '22 at 10:17

score 1 · Answer 1 · answered Sep 15 '22 at 10:27

1

Using grep that supports/has the -P flag.

grep -Po '(?<=</em>).*(?=</a>)' file

or

echo 'id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>' | grep -Po '(?<=</em>).*(?=</a>)'

As what was suggested in the comments, don't parse html/xml with such tools. Use a tool/utility for parsing such files.

answered Sep 15 '22 at 10:27

Jetchisel

score 0 · Answer 2 · answered Sep 15 '22 at 10:14

0

Just use grep with the -o switch in order only to show that information:

grep -o "</em>.*</a>" test.txt

.* stands for any number of any character.

answered Sep 15 '22 at 10:14

Dominique

score 0 · Answer 3 · answered Sep 15 '22 at 10:20

If your HTML string containing only one substring like that, you can use regexp and sed:

echo "id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>" | sed -rn 's@^.*</em>(.*)</a>.*$@\1@p'

Output:

4,519

If you have something more complicated, you may want to check parsing XML in bash. E.g., here.

Hope that helps.

3 Answers3