-3

I'm trying to print the value of a variable on an HTML file The file could be called something.html and I need to print the number inside of the nbLineValue variable, which, in this case, is 77

<span class="nbLineLabel"></span><span class="nbLineValue">77</span>

Any ideas?

EDIT: I managed to solve the problem with the following code

grep -oP '<span class="nbLineLabel"></span><span class="nbLineValue">\K[[:digit:]]*' something.html
  • 1
    Provide a [mcve] by [editing](https://stackoverflow.com/posts/47964222/edit) your post. Your description isn't clear and unambiguous enough for it to be clear what you're asking for. – hnefatl Dec 24 '17 at 21:48
  • Do you want the inner HTML from any element with the `nbLineValue` class? Or only from `span` elements, or only from elements with "simple text" in their body? – hnefatl Dec 24 '17 at 21:50
  • I'm a total noob on HTML what I need is the value of the number where the 77 is. He will change on each file – Sérgio Beltrão Dec 24 '17 at 21:55
  • Here's an idea: use `awk` or `grep` with appropriate regular expression ... also take a look at [how-to-extract-substring-and-numbers-only-using-grep-sed](https://stackoverflow.com/questions/15371450/how-to-extract-substring-and-numbers-only-using-grep-sed)... – PKey Dec 24 '17 at 21:59
  • 1
    HTML doesn't have native variables. Do you mean line number of `nbLineValue`, or are you trying to use some sort of a templating system, to generate the HTML file? Please clarify in your question. – forumulator Dec 24 '17 at 22:02
  • I want the number of the nbLineValue – Sérgio Beltrão Dec 24 '17 at 22:06

1 Answers1

1

Using a regex is a bad choice for parsing data in XML/HTML - see this question/answer.

You can use htmlutils, however - on Debian, Ubuntu, and Arch, the package is html-xml-utils. This comes with an application hxselect, which can perform HTML parsing on the command line using CSS selectors. From the docs page:

hxselect [ -i ] [ -c ] [ -l language ] [ -s separator ] selectors

hxselect reads a well-formed XML document and outputs all elements and attributes that match one of the CSS selectors that are given as an argument.

In your case, you can use a command like:

cat something.html | hxselect -i -c -s '\n' .nbLineValue

The options used here read as follows:

  • -i: Match case-insensitively. This is good for HTML where element tags can be any case.
  • -c: Display only the content (body) of each element, not the tags surrounding it. This ensures you just get 77, not all the surrounding.
  • -s '\n': Output a single newline after each matching element, for ease of parsing.
  • .nbLineValue: Select all elements with class nbLineValue
hnefatl
  • 5,860
  • 2
  • 27
  • 49
  • 1
    It gives me the following output "Input is not well-formed. (Maybe try normalize?)" – Sérgio Beltrão Dec 24 '17 at 22:22
  • With a file containing just the file contents you showed in your answer, and using the command line above, I get `77`. What's different between your setup and what you've put in your question? – hnefatl Dec 24 '17 at 22:24
  • There is more code inside of that HTML not only that single line that I post. With only that code it works, with the whole file not :/ – Sérgio Beltrão Dec 24 '17 at 22:32
  • @SérgioBeltrão Okay, post more of the file in your question or upload it to a file sharing site and send a link. You'll have invalid HTML in it somewhere. The `hxselect` application should point out the line number of syntax errors. – hnefatl Dec 24 '17 at 22:37
  • 1
    'grep -oP '\K[[:digit:]]*' something.html' Solved the problem. Thanks for your time! – Sérgio Beltrão Dec 24 '17 at 23:02