4

when I tried to curl some pages.

curl http://test.com

I can get like following result

<html>
<body>
<div>
  <dl>
    <dd> 10 times </dd>
  </dl>
</div>
</body>
</html>

my desired result is like simply 10 times..

Are there any good way to achieve this ?

If someone has opinion please let me know

Thanks

Heisenberg
  • 4,787
  • 9
  • 47
  • 76
  • 1
    Does this answer your question? [Parsing HTML on the command line; How to capture text in ?](https://stackoverflow.com/questions/18746957/parsing-html-on-the-command-line-how-to-capture-text-in-strong-strong) – costaparas Feb 15 '21 at 10:54
  • Please search "curl xpath bash", plenty of results... – marekful Feb 15 '21 at 10:56
  • Preferably, use one the answers in the linked post that use a proper HTML parser (preferred over using regex). The same method can be used for other tags. – costaparas Feb 15 '21 at 10:56

3 Answers3

2

If you are are unable to use a html parser for what ever reason, for your given simple html example, you could use:

 curl http://test.com | sed -rn 's@(^.*<dd>)(.*)(</dd>)@\2@p'

Redirect the output of the curl command into sed and enable regular expression interpretation with -r or -E. Split the lines into three sections and substitute the line for the second section only, printing the result.

Raman Sailopal
  • 12,320
  • 2
  • 11
  • 18
0

I have similar issue but i need to extract selected option value YES or NO from CURL HTML response.

<td valign=top>
    <select name="active">
          <option selected value="false">No</option>
          <option  value="true">Yes</option>
    </select>
</td>

CURL command with auth params to get selected option and it's value.

curl -d ".username=Jhon" -d ".password=123456" "https://test.com/" | grep "option selected" | tail -1 | awk -F'<|>'  '/option/{print $3}'

You can use tail or head if there is multiple select option to get desired HTML element

akshay_sushir
  • 1,483
  • 11
  • 9
0
$ curl ... | awk -F' *<[/]?dd> *' 'NF>1{print $2}'
$ curl ... | awk  'match($0,"<dd> *(.*) *</dd>",a){print a[1]}'
10 times
ufopilot
  • 3,269
  • 2
  • 10
  • 12