The dataset I have contains states and I would like for a new variable or column to be called Region, Pacific-Oregon, Calif., Washington Rock Mountains - Nevada, Montana Idaho, ect
I am confussed on where to go from here. Any ideas?
The dataset I have contains states and I would like for a new variable or column to be called Region, Pacific-Oregon, Calif., Washington Rock Mountains - Nevada, Montana Idaho, ect
I am confussed on where to go from here. Any ideas?
The classic way to do this would be with merge()
, or (since you added the tidyr
tag, so you're in the "Hadleyverse") dplyr::full_join()
. Assuming you have one data frame with states and other data:
d1 <- data.frame(state=c("Alaska","Massachusetts",
"Massachusetts","Florida"),
other_stuff=1:4)
and another data frame containing the matches between the states and their regions:
d2 <- data.frame(state=c("Alaska","Massachusetts","Florida"),
region=c("Western","Northeast","Southeast"))
Then
library("dplyr")
d1 %>% full_join(d2,by="state")
should do what you want.
But it's up to you to figure out where to get d2
, or the equivalent information, from.
Due to the fact that you did not provide your data I suppose youre data looks something like this:
df <- data.frame(state = c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Oregon", "Washington"))
I suppose you have a column in your data.frame (in this case called df$state) that has information on the state. You can create a new variable called region like this:
df$region[df$state == "California" | df$state == "Oregon" ] <- "Pacific"
df