17

I am trying to use the following syntax to get the occupation information from George Clooney's wikipedia page. Eventually I would like there to be a loop to get data on various personalitys' occupations.

However, I get the following problem running the below code:

Error in if (symbol != "role") symbol = NULL : argument is of length zero

I am not sure why this keeps on coming up.

library(XML)
library(plyr)
  url = 'http://en.wikipedia.org/wiki/George_Clooney'  

# don't forget to parse the HTML, doh!
  doc = htmlParse(url)  

# get every link in a table cell:
  links = getNodeSet(doc, '//table/tr/td') 

# make a data.frame for each node with non-blank text, link, and 'title' attribute:
  df = ldply(links, function(x) {
                text = xmlValue(x)
            if (text=='') text=NULL
         symbol = xmlGetAttr(x, 'class')
         if (symbol!='role') symbol=NULL
         if(!is.null(text) & !is.null(symbol))
                 data.frame(symbol, text)         } )  
zx8754
  • 52,746
  • 12
  • 114
  • 209
user1496289
  • 1,793
  • 4
  • 12
  • 13
  • 3
    Debugging advice: http://stackoverflow.com/a/5156351/636656 . Specifically, try `options(error=recover)` here. – Ari B. Friedman Jul 02 '12 at 14:34
  • 1
    the problem is most likely that `symbol` is `NULL`. See what happens with `if(NULL != "role") print('test')`. Something like this should work, although I didn't run your code: `if (!is.null(symbol) && symbol != 'role') symbol <- NULL` – GSee Jul 02 '12 at 14:41
  • Use `col.names = my_column_names` in kable() with `my_column_names` being character vector of your wanted names, for me it worked! – Benjamin Telkamp Dec 05 '17 at 11:49

2 Answers2

29

As @gsee mentioned, you need to check that symbol isn't NULL before you check its value. Here's a minor update to your code that works (at least for George).

df = ldply(
  links, 
  function(x) 
  {
    text = xmlValue(x)
    if (!nzchar(text)) text = NULL
    symbol = xmlGetAttr(x, 'class')
    if (!is.null(symbol) && symbol != 'role') symbol = NULL
    if(!is.null(text) & !is.null(symbol))
      data.frame(symbol, text)         
  } 
)
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
0

Use col.names = my_column_names in kable() with my_column_names being character vector of your wanted names, for me it worked!

maycca
  • 3,848
  • 5
  • 36
  • 67
Benjamin Telkamp
  • 1,451
  • 2
  • 17
  • 31