How to grep table HTML file using bash script

Question

Hi Brother and sister .

I have question with my work , but the real code is so much code, but i just want grep code html like this

<td>USER</td>
 <td><pre class=sf-dump id=sf-dump-957164173 data-indent-pad="  ">"<span class=sf-dump-str title="34 characters">Alex</span>"
 </pre><script>Sfdump("sf-dump-957164173")</script>
    </td>
 </tr>
 <tr>

I want output only ALEX

IM trying using this command for grep

grep -oP "<td>USER<\/td>\s+<td><pre.*>(.*?)<\/span>" c.html

But i dont have a result for my command , im trying with command sed but, i wanna learn using grep . too

Thank you

Wrong tool for the job. See https://stackoverflow.com/a/1732454/14122 — Charles Duffy, Aug 14 '20 at 15:51
...a better approach is to use a command-line tool that lets you run real XPath queries against your HTML (XPath being a query language specifically designed for structured documents, and also widely used from JavaScript for interacting with HTML in particular). — Charles Duffy, Aug 14 '20 at 15:51
...so, for an example of using xpath from bash, see https://stackoverflow.com/questions/4984689/bash-xhtml-parsing-using-xpath; or (showing how to handle HTML that isn't XHTML from input) https://stackoverflow.com/questions/37072931/getting-html-elements-via-xpath-in-bash — Charles Duffy, Aug 14 '20 at 15:52
@CharlesDuffy yes sir, but i want because i want to be bash expert xD , we cant grep with command grep sir ? — Edo Permata, Aug 14 '20 at 16:00
@EdoPermata A wise bash expert would follow the advises given by Charles Duffy: grep or sed are very bad tools for parsing html, there is an endless list of cases where it will lead you to bad results — yolenoyer, Aug 14 '20 at 16:14
@EdoPermata In your case I was thinking about `w3m`, which can be used to strip html tags: `w3m c.html -T text/html -dump | grep '^"' | tr -d '"'`, but `xpath` is really better suited for this kind of work. — yolenoyer, Aug 14 '20 at 16:18

score 0 · Answer 1 · answered Aug 16 '20 at 15:06

0

You can use perl (with the proper switches and regex) to extract the data in c.html:

perl -00ne 'print "$1\n" if m{<td>USER</td>\s*<td><pre.+?characters">(.+?)</span>}' c.html

produces

Alex

answered Aug 16 '20 at 15:06

LeadingEdger

604
4
7

How to grep table HTML file using bash script

1 Answers1