0

Hi Brother and sister .

I have question with my work , but the real code is so much code, but i just want grep code html like this

<td>USER</td>
 <td><pre class=sf-dump id=sf-dump-957164173 data-indent-pad="  ">"<span class=sf-dump-str title="34 characters">Alex</span>"
 </pre><script>Sfdump("sf-dump-957164173")</script>
    </td>
 </tr>
 <tr>

I want output only ALEX

IM trying using this command for grep

grep -oP "<td>USER<\/td>\s+<td><pre.*>(.*?)<\/span>" c.html

But i dont have a result for my command , im trying with command sed but, i wanna learn using grep . too

Thank you

  • Wrong tool for the job. See https://stackoverflow.com/a/1732454/14122 – Charles Duffy Aug 14 '20 at 15:51
  • ...a better approach is to use a command-line tool that lets you run real XPath queries against your HTML (XPath being a query language specifically designed for structured documents, and also widely used from JavaScript for interacting with HTML in particular). – Charles Duffy Aug 14 '20 at 15:51
  • 1
    ...so, for an example of using xpath from bash, see https://stackoverflow.com/questions/4984689/bash-xhtml-parsing-using-xpath; or (showing how to handle HTML that isn't XHTML from input) https://stackoverflow.com/questions/37072931/getting-html-elements-via-xpath-in-bash – Charles Duffy Aug 14 '20 at 15:52
  • @CharlesDuffy yes sir, but i want because i want to be bash expert xD , we cant grep with command grep sir ? – Edo Permata Aug 14 '20 at 16:00
  • @EdoPermata A wise bash expert would follow the advises given by Charles Duffy: grep or sed are very bad tools for parsing html, there is an endless list of cases where it will lead you to bad results – yolenoyer Aug 14 '20 at 16:14
  • 1
    @EdoPermata In your case I was thinking about `w3m`, which can be used to strip html tags: `w3m c.html -T text/html -dump | grep '^"' | tr -d '"'`, but `xpath` is really better suited for this kind of work. – yolenoyer Aug 14 '20 at 16:18

1 Answers1

0

You can use perl (with the proper switches and regex) to extract the data in c.html:

perl -00ne 'print "$1\n" if m{<td>USER</td>\s*<td><pre.+?characters">(.+?)</span>}' c.html

produces

Alex
LeadingEdger
  • 604
  • 4
  • 7