how to retrieve data from html between and

Question

I want to get the rate that is from 1 to 5 in amazon customer reviews. I check the source, and find this part looks as

<div style="margin-bottom:0.5em;">
    <span style="margin-right:5px;"><span class="swSprite s_star_5_0 " title="5.0 out of 5 stars" ><span>5.0 out of 5 stars</span></span> </span>
    <span style="vertical-align:middle;"><b>Works great right out of the box with Surface Pro</b>, <nobr>October 5, 2013</nobr></span>
  </div>

I want to get 5.0 out of 5 stars from

<span>5.0 out of 5 stars</span></span> </span>

how can i use xpathSApply to get it?

Thank you!

score 7 · Accepted Answer · answered Feb 23 '14 at 02:19

I would recommend using the selectr package, which uses css selectors in place of xpath.

library(XML)
doc <- htmlParse('
  <div style="margin-bottom:0.5em;">
    <span style="margin-right:5px;">
     <span class="swSprite s_star_5_0 " title="5.0 out of 5 stars" >
      <span>5.0 out of 5 stars</span></span> </span>
     <span style="vertical-align:middle;">
     <b>Works great right out of the box with Surface Pro</b>, 
     <nobr>October 5, 2013</nobr></span>
  </div>', asText = TRUE
)

library(selectr)
xmlValue(querySelector(doc, 'div > span > span > span'))

UPDATE: If you are looking to use xpath, you can use the css_to_xpath function in selectr to figure out the appropriate xpath command, which in this case turns out to be

"descendant-or-self::div/span/span/span"

score 1 · Answer 2 · answered Feb 23 '14 at 02:22

1

I do not know r much but I can give you the XPath string. It seems you want the first span's text which has no attribute and this would be:

//span[not(@*)][1]/text()

You can put this string into xpathSApply.

answered Feb 23 '14 at 02:22

therealmarv

3,692
4
24
42

how to retrieve data from html between and

2 Answers2

Linked