4

I have been struggling to get a piece of data using rvest. The piece of data I am looking for is the value 20960 which is insideOpenView(20960 ). How would I accomplish this with rvest?

An example section of the html I am working with is

<tr class="row-1" align="left">
<td style="width:120px;">
<a href="#" onclick='OpenView(20960 );return false;'>
BAKER, JAIME EDWARD</a>
</td>
</tr>
thatsawinner
  • 151
  • 8
  • 2
    What code have you tried? Can't you just extract the onclick attribute? – MrFlick Mar 10 '16 at 05:43
  • Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Mar 10 '16 at 07:10

1 Answers1

4

I think this requires a little grepping...

library("rvest")
library("stringr")
read_html('<tr class="row-1" align="left">
<td style="width:120px;">
          <a href="#" onclick=\'OpenView(20960 );return false;\'>
          BAKER, JAIME EDWARD</a>
            </td>
            </tr>') %>% 
  html_nodes("a") %>% 
  html_attr("onclick") %>%
  str_extract("(?<=\\().*(?=\\))") %>%    # returns the stuff inside the parens
  str_trim(side="both")                   # trims whitespace from both sides
  [1] "20960"
cory
  • 6,529
  • 3
  • 21
  • 41
  • That works. I hadn't considered needing to do some grepping after the html_attr. I'm still a bit new at this. Thank you for teaching me something new and for answering my question. Much appreciated. – thatsawinner Mar 12 '16 at 04:19