0

This is a multidisciplinary question so the answer may not be purely CSS.

I am parsing a large table and my goal is to retrieve only the text outside of the <b></b> tags. I am able to select the rows but stuck on how to only select text outside of the bold tag.

HTML

<div id="tab1">
<table width='650' class='subtblfont'>
    <tr><td>&nbsp;</td></tr> 
    <tr><td>&nbsp;</td></tr>        
    <tr>
        <td><b>Check-in Date:&nbsp;</b>04/20/2013</td>
        <td><b>Check-in Date:&nbsp;</b>04/25/2013</td>
    </tr>
</table>

Code

$row_content = $results_dom->find('div#tabs-1 tr:nth-child(3) td');

foreach (@$row_content) {
    print "$_\n";
}

Output

<td><b>Check-in Date:&nbsp;</b>04/20/2013</td>
<td><b>Check-in Date:&nbsp;</b>04/25/2013</td>

Desired Output

04/20/2013
04/25/2013

I am able to use regular expressions to pull out the text but that is not an ideal solution at this point. Is there a way to select only the non-bold text?

Not a machine
  • 508
  • 1
  • 5
  • 21
  • 2
    Strategy: traverse all child nodes of the ``. Filter out all `` elements. Extract the text from the rest. But it seems you don't want to get rid of the bold text, but find a date. Extracting all text, then applying a regex to the plain text might be more sensible if that is your goal. – amon Oct 18 '17 at 18:34
  • 1
    *"I am able to use regular expressions to pull out the text but that is not an ideal solution at this point"* I wonder why not? It seems that you want to extract the date from the `` element, and relying on that part of the text being unbolded is a strange approach. The bold tag is stylistic, not semantic, while an `mm/dd/yyyy` date sequence is very easy to extract precisely from the text. – Borodin Oct 18 '17 at 18:53
  • @Boro All the elements do not contain dates. There are hundreds of elements and some with very complex formats. – Not a machine Oct 18 '17 at 19:01
  • @amon To clarify, I do want to get ride of the bold text. I am unsure of how to filter out all the elements using CSS selectors (or Mojo's version of CSS selectors). – Not a machine Oct 18 '17 at 19:03
  • @Notamachine: Then **Pat's** `text` call should be exactly what you need. – Borodin Oct 18 '17 at 19:04
  • 1
    @Notamachine: *"or Mojo's version of CSS selectors"* as far as I know, `Mojo::DOM::CSS` is a pretty much complete implementation of CSS3. – Borodin Oct 18 '17 at 19:12
  • @Borodin The coverage may be complete but the syntax is not always what you might expect if you are a jQuery user as I am. – Not a machine Oct 19 '17 at 03:25
  • @Notamachine: I can't imagine what you may mean. The syntax of CSS is defined independently of both Mojolicious and jQuery, and if you find the implementation in `Mojo::DOM` unfamiliar then it is jQuery that is non-standard. I wonder if you may have been using CSS2 before, although CSS3 is a superset of previous editions and anything that worked before should still be correct. Either way, the point is that you are uncomfortable with CSS3 and I will bear that in mind. – Borodin Oct 19 '17 at 05:33
  • @Notamachine: I've just discovered that jQuery provides a non-standard superset of CSS. In particular the `has` selector, which would be useful here, is not part of standard CSS. I believe it currently has *experimental* status for CSS4, and isn't supported by all browsers. – Borodin Oct 19 '17 at 05:53

1 Answers1

2

From the Documentation:

text

Extract text content from this element only (not including child elements).

Try giving this a shot:

(Granted I don't really know perl, so if I got the syntax wrong... sorry)

$row_content = $results_dom->find('div#tabs-1 tr:nth-child(3) td')->each(sub { say $_->text})
Pat
  • 2,540
  • 1
  • 21
  • 28
  • Thank you for this. I tried ->text but apparently didn't wrap it correctly. – Not a machine Oct 18 '17 at 19:09
  • The tidiest way to do this is to use `map` to call the `text` method on each matching element, and `each` without parameters to convert the collection into an ordinary list. `my @vals = $results_dom->find('...')->map('text')->each` will leave the array containing the required text strings. – Borodin Oct 19 '17 at 05:41
  • By the way, you may prefer to use `tr:last-child` in place of `tr:nth-child(3)` if the third row is always the last row in the table. – Borodin Oct 19 '17 at 05:46