Print a value from a HTML variable on linux Shell

Question

I'm trying to print the value of a variable on an HTML file The file could be called something.html and I need to print the number inside of the nbLineValue variable, which, in this case, is 77

<span class="nbLineLabel"></span><span class="nbLineValue">77</span>

Any ideas?

EDIT: I managed to solve the problem with the following code

grep -oP '<span class="nbLineLabel"></span><span class="nbLineValue">\K[[:digit:]]*' something.html

Provide a [mcve] by [editing](https://stackoverflow.com/posts/47964222/edit) your post. Your description isn't clear and unambiguous enough for it to be clear what you're asking for. — hnefatl, Dec 24 '17 at 21:48
Do you want the inner HTML from any element with the `nbLineValue` class? Or only from `span` elements, or only from elements with "simple text" in their body? — hnefatl, Dec 24 '17 at 21:50
I'm a total noob on HTML what I need is the value of the number where the 77 is. He will change on each file — Sérgio Beltrão, Dec 24 '17 at 21:55
Here's an idea: use `awk` or `grep` with appropriate regular expression ... also take a look at [how-to-extract-substring-and-numbers-only-using-grep-sed](https://stackoverflow.com/questions/15371450/how-to-extract-substring-and-numbers-only-using-grep-sed)... — PKey, Dec 24 '17 at 21:59
HTML doesn't have native variables. Do you mean line number of `nbLineValue`, or are you trying to use some sort of a templating system, to generate the HTML file? Please clarify in your question. — forumulator, Dec 24 '17 at 22:02

hnefatl · Answer 1 · 2017-12-24T22:28:44.177

1

Using a regex is a bad choice for parsing data in XML/HTML - see this question/answer.

You can use htmlutils, however - on Debian, Ubuntu, and Arch, the package is html-xml-utils. This comes with an application hxselect, which can perform HTML parsing on the command line using CSS selectors. From the docs page:

hxselect [ -i ] [ -c ] [ -l language ] [ -s separator ] selectors

hxselect reads a well-formed XML document and outputs all elements and attributes that match one of the CSS selectors that are given as an argument.

In your case, you can use a command like:

cat something.html | hxselect -i -c -s '\n' .nbLineValue

The options used here read as follows:

-i: Match case-insensitively. This is good for HTML where element tags can be any case.
-c: Display only the content (body) of each element, not the tags surrounding it. This ensures you just get 77, not all the surrounding.
-s '\n': Output a single newline after each matching element, for ease of parsing.
.nbLineValue: Select all elements with class nbLineValue

edited Dec 24 '17 at 22:28

answered Dec 24 '17 at 22:11

hnefatl

5,860
2
27
49

1

It gives me the following output "Input is not well-formed. (Maybe try normalize?)" – Sérgio Beltrão Dec 24 '17 at 22:22
With a file containing just the file contents you showed in your answer, and using the command line above, I get `77`. What's different between your setup and what you've put in your question? – hnefatl Dec 24 '17 at 22:24
There is more code inside of that HTML not only that single line that I post. With only that code it works, with the whole file not :/ – Sérgio Beltrão Dec 24 '17 at 22:32
@SérgioBeltrão Okay, post more of the file in your question or upload it to a file sharing site and send a link. You'll have invalid HTML in it somewhere. The `hxselect` application should point out the line number of syntax errors. – hnefatl Dec 24 '17 at 22:37
1

'grep -oP '\K[[:digit:]]*' something.html' Solved the problem. Thanks for your time! – Sérgio Beltrão Dec 24 '17 at 23:02

Print a value from a HTML variable on linux Shell

1 Answers1