2

I am populating a new variable of a dataframe, based on string conditions from another variable. I receive the following error msg:

Error in Source == "httpWWW.BGDAILYNEWS.COM" | Source == : operations are possible only for numeric, logical or complex types

My code is as follows:

County <- ifelse(Source == 'httpWWW.BGDAILYNEWS.COM' | 'WWW.BGDAILYNEWS.COM', 'Warren', ifelse(Source == 'httpWWW.HCLOCAL.COM' | 'WWW.HCLOCAL.COM', 'Henry', ifelse(Source == 'httpWWW.KENTUCKY.COM' | 'WWW.KENTUCKY.COM', 'Fayette', ifelse(Source == 'httpWWW.KENTUCKYNEWERA.COM' | 'WWW.KENTUCKYNEWERA.COM', 'Christian') )))

NiuBiBang
  • 628
  • 1
  • 15
  • 30
  • 2
    You need to make a comparison after every `|`. For example, `ifelse(Source=='foo1' | Source=='foo2', return1, return2)` instead of `ifelse(Source=='foo1' | 'foo2', return1, return2)` – ialm Jul 25 '13 at 18:50
  • Yeah, you're right & I missed a few comparator conditions which is why the code failed. Nevertheless, I should try a cleaner technique... – NiuBiBang Jul 26 '13 at 17:44

2 Answers2

7

I suggest you break down that deeply nested ifelse statement into more manageable chunks.

But the error is telling you that you cannot use | like that. 'a' | 'b' doesn't make sense since its a logical comparison. Instead use %in%:

Source %in% c('htpWWW.BGDAILYNEWS.com', 'WWW.BGDAILYNEWS.COM')

I think... If I understand what you're doing, you will be much better off using multiple assignments:

County = vector(mode='character', length=length(Source))
County[County %in% c('htpWWW.BGDAILYNEWS.com', 'WWW.BGDAILYNEWS.COM')] <- 'Warren'
etc.

You can also use a switch statement for this type of thing:

myfun <- function(x) {
  switch(x,
         'httpWWW.BGDAILYNEWS.COM'='Warren',
         'httpWWW.HCLOCAL.COM'='Henry',
         etc...)
}

Then you want to do a simple apply (sapply) passing each element in Source to myfun:

County = sapply(Source, myfun)

Or finally, you can use factors and levels, but I'll leave that as an exercise to the reader...

Justin
  • 42,475
  • 9
  • 93
  • 111
  • +1 for `switch`. If you find yourself using more than 2 `ifelse`'s, you should probably be using `switch` or `cut`. – Señor O Jul 25 '13 at 19:20
  • @Justin, sorry to ask but can you post a more specific example for the `switch` function? Specifically, how do I create the `County` variable from the `Source` variable using `switch`? Much agreed that my `ifelse`s are not parsimonious. – NiuBiBang Jul 26 '13 at 18:42
6

A different approach:

county <- c("Warren","Henry","Fayette","Christian")
sites <- c("WWW.BGDAILYNEWS.COM","WWW.HCLOCAL.COM","WWW.KENTUCKY.COM","WWW.KENTUCKYNEWERA.COM")
County <- county[match(gsub("^http","",Source), sites)]

This will return NA for strings that do no match any of the given inputs.

Using Hadley's suggestion (lookup-tables-character-subsetting):

lookup <- c(WWW.BGDAILYNEWS.COM="Warren", WWW.HCLOCAL.COM="Henry", WWW.KENTUCKY.COM="Fayette", WWW.KENTUCKYNEWERA.COM="Christian")
County <- unname(lookup[gsub("^http","",Source)])
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
  • You could also eliminate the `match` and use character subsetting directly: https://github.com/hadley/devtools/wiki/Subsetting#lookup-tables-character-subsetting – hadley Jul 26 '13 at 13:13