From what I gather, it is generally considered a bad idea to parse html in Bash. But a person never learns to ride a bike without also falling a few times in the process.
And so, using Bash, I'm trying to extract some data from an html webpage. The relevant pieces I am trying to obtain are data-nick="someguy99"
which is a username and then the message "Hello. This is the data I wish to obtain."
displayed on the line directly underneath.
<body>
<div id="main">
<div class="content">
<div class="block">
<div class="section">
<div class="chat-holder">
<div class="chat-box">
<div class="chat-list">
<div id="0" class="text" style="color: rgb(73, 73, 73);">
<span class="username messagelabel" data-nick="someguy99">someguy99:</span>
"Hello. This is the data I wish to obtain."
Using wget
I have not been able to traverse past "chat-list"
. I have tried piping the output to other programs wget -O - http://website.url | lynx -source -dump
But nothing is working. Always the same output. For instance:
wget --quiet -F -O - http://website.url/example | \
lynx -dump -source -stdin | grep 'chat-list'
and the result...
var img = $('.chat-list img[title="' + slug + '"]');
This is not the same as the output seen in the document tree when using a web browser. And replacing grep 'chat-list'
with grep 'data-nick'
returns no matching patterns at all.
What am I doing wrong? How do I parse deeper to obtain the data I seek?
My brain feels a bit fried right now so If I left out any relevant information just let me know and I'll provide more details.
- Mac OS X 10.11.5
- GNU bash 4.3.42
Thank you.