-7

The dataset I have contains states and I would like for a new variable or column to be called Region, Pacific-Oregon, Calif., Washington Rock Mountains - Nevada, Montana Idaho, ect

I am confussed on where to go from here. Any ideas?

Deborah_Watson
  • 277
  • 1
  • 2
  • 8
  • 7
    You may start by making a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – nrussell Dec 10 '15 at 13:25
  • What do you have and what do you want? Is the new column dependent on values in the columns you already have? – nist Dec 10 '15 at 13:32
  • If state = Calif, Oregon, Washington, then region = Pacific. I have the states listed as a column. – Deborah_Watson Dec 10 '15 at 13:46
  • 2
    where is your information about the correspondence between states and regions coming from? – Ben Bolker Dec 10 '15 at 13:54
  • It would be easier with an example. But for that you can use ifelse() and %in% together. – Roman Dec 10 '15 at 13:54

2 Answers2

3

The classic way to do this would be with merge(), or (since you added the tidyr tag, so you're in the "Hadleyverse") dplyr::full_join(). Assuming you have one data frame with states and other data:

d1 <- data.frame(state=c("Alaska","Massachusetts",
                 "Massachusetts","Florida"),
                 other_stuff=1:4)

and another data frame containing the matches between the states and their regions:

d2 <- data.frame(state=c("Alaska","Massachusetts","Florida"),
                 region=c("Western","Northeast","Southeast"))

Then

library("dplyr")
d1 %>% full_join(d2,by="state")

should do what you want.

But it's up to you to figure out where to get d2, or the equivalent information, from.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
1

Due to the fact that you did not provide your data I suppose youre data looks something like this:

df <- data.frame(state = c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Oregon", "Washington"))

I suppose you have a column in your data.frame (in this case called df$state) that has information on the state. You can create a new variable called region like this:

df$region[df$state == "California" | df$state == "Oregon" ] <- "Pacific"

df
maller
  • 229
  • 2
  • 4
  • 14