-1

i need to grab the text inside a label with a specific class that has a checked radio input inside it.

This is the HTML:

<div id="ships-from2">

    <label for="ship_hk_intl">
        <input type="radio" name="ship_mode_name" id="ship_hk_intl" data-action="http://www.example.com/" value="hk_intl">
            Hong Kong Warehouse - USD44.31
    </label>

    <label for="ship_us_intl">
        <input type="radio" name="ship_mode_name" id="ship_us_intl" data-action="http://www.example.com/" checked value="us_intl">
            United States Warehouse - USD45.10
    </label>

</div>

.
I need:
the string inside the label that has a checked radio button. The actuall radio button might change so i need to check which one is checked

I am scraping the dom and using xpath but have no idea how to write the query Ideas anyone?

EDIT 1 - CODE THUS FAR (response to @TimDev):

    $div        = $dom->getElementById('ships-from2');
    $query      = '//input[@checked]/../text()';
    $e          = $xpath->query($query, $div);
    echo 'TEST:'.trim($e->item(1)->nodeValue);
Sagive
  • 1,699
  • 1
  • 24
  • 34
  • hey @hakre - this is not a duplicate question! i haven't submitted it twice which is the meaning of duplicate - similar isn't duplicate. Also Please remember that if the title and body of question is much different google or stackoverflow search engine wont return it as a relevant answer (meaning i couldnt find it). Thus closing this and downvoting is just an automated action you didn't really think of. – Sagive Jan 20 '16 at 06:46
  • I beg your pardon judging my actions. Despite according to your own comment the answer isn't nailing it but only coming close (and you spared us all the answer in the end), you perhaps should broaden your view about the meaning of "duplicate question" in context of this website. Meta should be the place where you can raise your voice about these topics and if there is anything wrong with this dupe close, you should see traction. – hakre Jan 20 '16 at 07:12
  • Its not that. 1st, i appreciate people trying to help and pointing me and others in the right direction - even if the answers aren't perfect they can still help. Second - this is not a duplicate (which is the issue here). I searched and searched before asking. If the title and body of question is different than many like me wont find the question by using search engines thus making needed variations important. In any case, thanks for your response - i personally think that the downvote is created for questions that are too localized, commercially driven or just dont include enough information. – Sagive Jan 20 '16 at 07:30
  • Or too broad with no clear problem statement - whatever. What you write about the search engines and the different ways on how to word Q&A and a search query, having this question marked as a duplicate works equally well as not. Google will find "your" question here just fine. Just in case this was not obvious to you. – hakre Jan 20 '16 at 19:44

4 Answers4

1

You might need to tweak the query a little but it does return radio input fields and you can rasily check for required attributes.

    $html='
        <div id="ships-from2">
            <label for="ship_hk_intl">
                <input type="radio" name="ship_mode_name" id="ship_hk_intl" data-action="http://www.example.com/" value="hk_intl">
                    Hong Kong Warehouse - USD44.31
            </label>
            <label for="ship_us_intl">
                <input type="radio" name="ship_mode_name" id="ship_us_intl" data-action="http://www.example.com/" checked value="us_intl">
                    United States Warehouse - USD45.10
            </label>
        </div>';

        $dom=new DOMDocument;
        $dom->loadHTML( $html );
        $xpath=new DOMXPath( $dom );
        $col=$xpath->query('//label/input');

        foreach( $col as $node ) if( $node->hasAttribute('checked') ) {
            echo $node->getAttribute('value').' '.$node->parentNode->nodeValue;
        }
        $dom=null;
        $xpath=null;
Professor Abronsius
  • 33,063
  • 5
  • 32
  • 46
1

with xpath you can do like this

//input[@checked]/..

to get text

//input[@checked]/../text()

function test(field) {
  console.log(field.parentElement.innerText);
}
<div id="ships-from2">

    <label for="ship_hk_intl">
        <input type="radio" onchange="test(this)" name="ship_mode_name" id="ship_hk_intl" data-action="http://www.example.com/" value="hk_intl">
            Hong Kong Warehouse - USD44.31
    </label>

    <label for="ship_us_intl">
        <input type="radio" onchange="test(this)"  name="ship_mode_name" id="ship_us_intl" data-action="http://www.example.com/" checked value="us_intl">
            United States Warehouse - USD45.10
    </label>

</div>
Raghavendra
  • 3,530
  • 1
  • 17
  • 18
0

I'm not sure why raghavendra gives javascript, but here's a PHP example. He is right using //input[@checked]/../text().

Note: ../text() is returning two items of text. It returns all the surrounding text of the input node. Which is also the whitespaces between <label> and <input>.

That's why in the below snippet we get the second text with $e->item(1)->nodeValue

$html = <<<EOC
<div>
    More HTML!
</div>
<div>
   Even more HTML!
</div>
<div id="ships-from2">
    <label for="ship_hk_intl">
        <input type="radio" name="ship_mode_name" id="ship_hk_intl" data-action="http://www.example.com/"
               value="hk_intl"/>
        Hong Kong Warehouse - USD44.31
    </label>
    <label for="ship_us_intl">
        <input type="radio" name="ship_mode_name" id="ship_us_intl" data-action="http://www.example.com/" checked
               value="us_intl"/>
        United States Warehouse - USD45.10
    </label>
</div>
EOC;

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpathObject = new DOMXPath($doc);

$div = $doc->getElementById('ships-from2');
$query = '//input[@checked]/../text()';
$e = $xpathObject->query($query, $div);
echo trim($e->item(1)->nodeValue);
Timmetje
  • 7,641
  • 18
  • 36
  • thats almost there ;) - how do i target the div#ships-from2? the upper HTML is only part of a complete page - see edit please – Sagive Aug 14 '15 at 13:40
  • `$div = $doc->getElementById('ships-from2');` is correct – Timmetje Aug 14 '15 at 13:48
  • than it returns the wrong data ;( - i am so clusses - just testing every variation i can think of. – Sagive Aug 14 '15 at 13:50
  • Than post your html, because the above snippet works 100%. Btw it also depends how you actually load your html. If it's a file use `$doc->loadHTMLFile($html);`. – Timmetje Aug 14 '15 at 13:51
-2

First of all it's not valid XML. Each attribute must have a value so replace

 ...data-action="http://www.example.com/" checked value="us_intl">

with

 ata-action="http://www.example.com/" checked="true" value="us_intl" />

then your xpath will look like this:

  //input[@checked="true" and  @id="ship_us_intl"]
Joachim Weiß
  • 407
  • 2
  • 12
  • Attributes with no values like `checked` or `disabled` are valid in HTML5. Also, OP is trying to scrape third-party HTML so "change the HTML" is not an answer. Finally, xpath expressions don't require attribute values either; `//input[@checked]` would do the job just fine. – lafor Aug 14 '15 at 12:12
  • Agree, not even that, the answer is also strictly speaking incorrect if Joachim wants to make it super duper valid (WCAG triple platinum). We can go deeper with nitpicking. Officially `checked="true"` is also strictly speaking incorrect. If you look at the spec, it says: `checked (checked)` which means "The checked attribute can have the value 'checked'). http://www.w3.org/TR/REC-html40/interact/forms.html#edef-INPUT – Timmetje Aug 14 '15 at 12:26