0

I have a dataframe called "data":

data <-
            x                y 
            green-dog        3
            blue-dog         4
            red-cat          5
            yellow-cat       6

I need to create a new variable called "type", like this:

data <-
                x                y      type
                green-dog        3      dog
                blue-dog         4      dog
                red-cat          5      cat
                yellow-cat       6      cat
  • 2
    I answered this and hope it helps! For next time you'll get better results with a reproducible example (so I shouldn't have to write code to define "data" for instance) - see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. It would also have been good to know if this is your actual problem, or just a cut down version of a more complex problem. And what you'd tried so far and got stuck on. – Peter Ellis Jan 04 '17 at 23:32
  • 1
    To complement Peters comment: putting `data <- read.table(header=T, text="` in front of and `")` behind your data table makes it easier for all to copy and paste things into their R environments. – lukeA Jan 04 '17 at 23:43

2 Answers2

3

There are many ways to do this but this is the simplest if there are only two categories:

data <- data.frame(
    x = c("green-dog", "blue-dog", "red-cat", "yellow-cat"),
    y = 3:6)

data$type <- ifelse(grepl("dog", data$x), "dog", "cat")

Note that as written, anything without a "dog" in it becomes cat, even if "cat" isn't there either. Things to consider:

  • how to handle NAs?
  • should you have an explicit check for "cat", or can it just be the default (not dog) option?
  • what about upper/lower case?
  • do you want to capture the colour too?

If you need something more complex, I'd suggest checking out the stringr package.

Peter Ellis
  • 5,694
  • 30
  • 46
0

We can do this easily with sub

data$type <-  sub(".*-", "", data$x)
data$type
#[1] "dog" "dog" "cat" "cat"
akrun
  • 874,273
  • 37
  • 540
  • 662