-3

Imagine a vector animals <- c("dog", "wolf", "cat"); animals <- as.data.frame(animals)

I'd like to create a new vector that looks like this: animals$dogs <- c("dog", "dog", "cat)

Is there a dplyr function that will perform this operation?

gh0strider18
  • 1,140
  • 3
  • 13
  • 23
  • 1
    What's the logic behind that? Take the first element twice and then the third? – talat Dec 30 '14 at 01:25
  • if animals = "dog" then dogs = "dogs"; if animals = "wolf" then dogs = "dogs"; if animals = "cat" then dogs = "cat" – gh0strider18 Dec 30 '14 at 01:31
  • 1
    what about `ifelse(animals %in% c("dog","wolf"),"dog","cat")` ? – Ben Bolker Dec 30 '14 at 01:35
  • `ifelse` is your friend for such problems, as shown by Ben Bolker. It takes vector input as opposed to `if`... `else` constructs. Read `?ifelse` – talat Dec 30 '14 at 01:39
  • sure - that works... wondering if there is a `dplyr` method as I am transitioning a lot of my cleaning/scrubbing data processes to using the library. – gh0strider18 Dec 30 '14 at 01:39
  • `dplyr` is a package specially created for working with data.frame-like objects, not atomic vectors. – talat Dec 30 '14 at 01:40
  • whoops, @docendodiscimus, I'll edit my question... – gh0strider18 Dec 30 '14 at 01:44
  • If it was a factor vector, you could change the factor levels for wolf to dog. – talat Dec 30 '14 at 01:45
  • That's another great way @docendodiscimus - however I'm wondering about a dplyr solution - I should probably change my title to reflect that... sorry. – gh0strider18 Dec 30 '14 at 01:47
  • 6
    You should probably specify why it needs to be dplyr. If there really is no reason other than you think it will provide the fastest/best solution then don't mention it at all - if it is the best people will provide a solution like that. If you were asking a question about how to do something around the house you wouldn't put some restriction on it before you knew the possible solutions "I need to open a door - please use a chisel in your solution" <- there are plenty of good non-chisel solutions. – Dason Dec 30 '14 at 03:54

2 Answers2

2
animals <- data.frame(animals=c("dog", "wolf", "cat"))

I believe the dplyr idiom would be:

library("dplyr")
animals %>% mutate(dogs=ifelse(animals %in% c("dog","wolf"),
                                  "dog",
                                  "cat"))

You could also use car::recode() for this.

library("car")
animals %>% mutate(dogs=recode(animals,"c('dog','wolf')='dog'"))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • You could also use `replace` instead of `ifelse`. Think I remember seeing a benchmark where that was faster than `ifelse`. – talat Dec 30 '14 at 07:22
  • The reference I was thinking of is [here](http://stackoverflow.com/questions/23642811/replace-parts-of-a-variable-using-numeric-indices-in-dplyr-do-i-need-to-create). – talat Dec 30 '14 at 07:50
2

The fastest way I can think of is to subset the vector or data.frame and replace the "wolf" entries directly without ifelse:

animals <- c("dog", "wolf", "cat")
dogs <- animals
dogs[dogs == "wolf"] <- "dog"

Or in case of a data.frame:

animals <- data.frame(animals=c("dog", "wolf", "cat"))
animals$dog <- animals$animals
animals$dog[animals$dog == "wolf"] <- "dog"

The advantage should be that you're only modifying a subset of the data instead of the whole vector. If your data is small, it probably won't make a difference or could even be slower than ifelse, but for a larger vector I believe it would perform better (not benchmarked, though).

talat
  • 68,970
  • 21
  • 126
  • 157