0

I'm capturing URL content using cURL which gives output in HTML format. Using awk I'm capturing sensor name and its status.

(curl <MY URL> | awk -F"Sensor<\/th><td>" '{print $2}' | awk -F"<\/td></tr>" '{print $1}'; \
 curl <my URL> | awk -F"Status<\/th><td><strong>" '{print $2}' | awk -F"<\/strong>" '{printf $1}' \
) | tr -d '\n' >> output

cURL input like,

<html><head><title>Sensor status for NumberOfThreadsSensor-NumberOfThreads</title></head><body>
<h1>Sensor status for NumberOfThreadsSensor-NumberOfThreads</h1>
<table>
<tr><th>Plugin</th><td>NumberOfThreadsSensor</td></tr><tr><th>Sensor</th><td>NumberOfThreads</td></tr><tr><th>Status</th><td>Ok</td></tr><tr><th>Created</th><td>Fri Aug 14 09:03:10 UTC 2020 (13 seconds ago)</td></tr><tr><th>TTL</th><td>30 seconds</td></tr><tr><th>Short message</th><td>1;14;28</td></tr><tr><th>Long message</th><td>1 [interval: 1 min];14 [interval: 30 min];28 [interval: 60 min]</td></tr></table>
<h2>Formats</h2><p>The status shown on this page is also available in the following machine-friendly formats:</p>
<ul>
<li><a href="/admin/monitoring/NumberOfThreadsSensor-NumberOfThreads/status">A simple status string</a>, Possible values: OK, WARNING, CRITICAL, UNKNOWN.</li>
<li><a href="/admin/monitoring/NumberOfThreadsSensor-NumberOfThreads/nagios">Nagios plugin output</a>, output formatted for easy integration with Nagios.</li>
<li><a href="/admin/monitoring/NumberOfThreadsSensor-NumberOfThreads/xml">Full xml</a> all available data in xml for easy parsing by ad-hoc monitoring tools.</li>
<li><a href="/admin/monitoring/NumberOfThreadsSensor-NumberOfThreads/prometheus">Prometheus output</a>, all available data in prometheus format</li>
</ul>
<p>Please do not rely on the output of this page for automated monitoring, use one of the formats above.</p>
</body></html>

Current output ScoreProcessorWarning

expected output ScoreProcessor Warning

Please help me to simplify my shell script and I'm in learning phase. Thanks for help

rajkumar
  • 15
  • 5
  • 1
    Welcome to SO. special thanks for adding your efforts. Please post sample of input and expected output in your question and let us know then. – RavinderSingh13 Aug 14 '20 at 08:00
  • 1
    I suggest to use html aware utility to parse html output. Ie. `xmllint`. – KamilCuk Aug 14 '20 at 09:01
  • The current and expected output are the same `ScoreProcessor Warning`. Also, the output from the curl include has mismatch (unclosed in line 8). Please verify input and expected output
    – dash-o Aug 14 '20 at 09:02
  • @dash-o I have corrected current & expected output. – rajkumar Aug 14 '20 at 09:43
  • @rajkumar thanks for fixing the input. It is not clear what is the expected output (are you just trying to get a space between ScoreProcessor" and "Warning" ?. Also, the script that you provide does not generate "ScoreProcessorWarning" for the input that you provided. – dash-o Aug 14 '20 at 15:20

1 Answers1

0

With the input presented saved in /tmp/input.txt:

<h1>Sensor status for EventProcessorStatus-ScoreProcessor</h1>
<table>
<tr><th>Plugin</th><td>EventProcessorStatus</td></tr><tr><th>Sensor</th><td>ScoreProcessor</td></tr><tr><th>Status</th><td><strong>Warning</strong></td></tr><tr><th>Created</th><td>Fri Aug 10 00:16:23 UTC 2020 (0 seconds ago)</td></tr><tr><th>TTL</th><td>30 seconds</td></tr><tr><th>Short message</th><td>Endpoint is running, but has errors</td></tr><tr><th>Long message</th><td>Endpoint is running, but has errors<br/>
Number of errors in background process (xxxx) logs: 4<br/>
</td></tr></table>
<h2>Performance data</h2><table>

with my very limited knowledge of xmllint I ended with:

# Extract only table, get text from all tales
xmllint --html --xpath '//table//tr//text()' /tmp/input.txt |
# Because we know table has two rows, join two lines together
sed 'N;s/\n/\t/' |
# Filter Sensor and status only
sed -n '/Sensor\t/{s///;h}; /Status\t/{s///;x;G;p}' |
# Read the sensor and status to bash
{ IFS= read -r name; IFS= read -r status; echo "name=$name status=$status" ;}

which outputs:

name=ScoreProcessor status=Warning
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Thank u !. I have placed exact content of /tmp/input.txt and executed xmllint but I'm not getting sensor name and status as "name= status=" only. I tried with actual curl output as well. – rajkumar Aug 14 '20 at 09:57
  • Seems like [this issue](https://stackoverflow.com/questions/18532948/how-to-append-a-newline-after-every-match-using-xmlint-xpath). I have xmllint 20910 :/ . One of the workarounds worked with removing empty lines [repl bash link with code](https://repl.it/@kamilcukrowski/OffbeatSeashellCoordinate#main.sh). I think it should be possible with xmllint or xmlstarlet to select the column row... – KamilCuk Aug 14 '20 at 10:05
  • appreciating your work but I don't have option to upgrade xmllint 20910 :( Is there any option to add space on my current output between name and status ? – rajkumar Aug 14 '20 at 10:54
  • simply I saved both sensor and status as variable and print it with required space in to file. – rajkumar Aug 26 '20 at 09:10