Using a regex is a bad choice for parsing data in XML/HTML - see this question/answer.
You can use htmlutils
, however - on Debian, Ubuntu, and Arch, the package is html-xml-utils
. This comes with an application hxselect
, which can perform HTML parsing on the command line using CSS selectors. From the docs page:
hxselect [ -i ] [ -c ] [ -l language ] [ -s separator ] selectors
hxselect
reads a well-formed XML document and outputs all elements and attributes that match one of the CSS selectors that are given as an argument.
In your case, you can use a command like:
cat something.html | hxselect -i -c -s '\n' .nbLineValue
The options used here read as follows:
-i
: Match case-insensitively. This is good for HTML where element tags can be any case.
-c
: Display only the content (body) of each element, not the tags surrounding it. This ensures you just get 77
, not all the surrounding.
-s '\n'
: Output a single newline after each matching element, for ease of parsing.
.nbLineValue
: Select all elements with class nbLineValue