1

I have a data frame (df) that lists the countries associated with every site

Site          Country
Site1         USA
Site2         Vietnam
Site3         Spain
Site4         Germany
Site5         China

I want to attach a column, where for each country I associate its corresponding continent. I wrote a simple if loop to do this:

df$Continent <- NA
if(df$Country == "USA" |df$Country ==  "Canada" |df$Country == "Mexico")
 {df$Continent <- "North America"}
if(df$Country == "Spain" |df$Country == "France" |df$Country == "Germany")
{df$Continent <- "Europe"}
## .. etc

summary(df)

However, each time I run it the df, I find that it assigns North America to all the countries. I understand that this may sound trivial, but does it make a difference if I use if statments everywhere and not else or if else? Any suggestions for correcting this?

neilfws
  • 32,751
  • 5
  • 50
  • 63
Share
  • 395
  • 7
  • 19
  • 3
    `if` and `ifelse` are not the same at all. You're probably better off using a lookup-table of sorts - http://stackoverflow.com/questions/18456968/how-do-i-map-a-vector-of-values-to-another-vector-with-my-own-custom-map-in-r/18457055 – thelatemail Mar 09 '17 at 03:45

3 Answers3

6

Build a lookup table and merge() it with the data.

For example:

lookup <- data.frame(Country = c("USA", "Canada", "Mexico",
                                 "Spain", "France", "Germany",
                                 "Vietnam", "China"),
                     Continent = rep(c("North America", "Europe", "Asia"),
                                     times = c(3,3,2)))

Using your snippet of data as data frame df, we can add Continent via merge() (a join in database terminology):

> merge(df, lookup, sort = FALSE, all.x = TRUE)
  Country  Site     Continent
1     USA Site1 North America
2 Vietnam Site2          Asia
3   Spain Site3        Europe
4 Germany Site4        Europe
5   China Site5          Asia
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Although I want to use countries, I need to divide USA into Northern and Southern Sites based on its states..which is in another column. So, I may have to prefer a if-loop like I have written for easy manipulation. Also, I have many, many countries in each continent. If I do this I will have to keep track of each country in the continent, and is there is a mistake its hard to identify which continent I went wrong in. – Share Mar 09 '17 at 04:04
  • I'm appreciative of the method and thank you, however, I am merely pointing out the practical difficulties of the method. – Share Mar 09 '17 at 04:05
  • @Ash - merge can deal with multiple `by=` variables so you can have a lookup table with both country and state in it. This method still works perfectly. – thelatemail Mar 09 '17 at 04:07
3

If you're working with a factor you can also do some nonsense with levels, or levels<- to be exact:

`levels<-`(dat$Country, list(
  `North America`   = c("USA","Canada","Mexico"),
  `Europe`          = c("Spain","France","Germany"),
  `Asia`            = c("Vietnam","China")
))
#[1] North America Asia          Europe        Europe        Asia         
#Levels: North America Europe Asia
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • I don't fully understand. I am not creating another vector with continents and dding to another dataframe. Also, although I want to use countries, I need to divide USA into Northern and Southern Sites based on its states..which is in another column. So, I may have to prefer a if-loop like I have written for easy manipulation. Any suggestions on how to rectify yours? – Share Mar 09 '17 at 04:01
  • 1
    @Ash - well, that changes the entire question. Gavin's lookup table idea is the best if you are dealing with multiple variables. There is rarely a need to do an if-loop in R as you can do things like ?merge or ?match – thelatemail Mar 09 '17 at 04:06
2

I like ifelse() for things like this. You could use it with the %in% operator like this:

df$Continent <- ifelse(df$Country %in% c("USA", "Canada", "Mexico"),
                       "North America", df$Continent)
df$Continent <- ifelse(df$Country %in% c("Spain", "France", "Germany"),
                       "Europe", df$Continent)
df
   Site Country     Continent
1 Site1     USA North America
2 Site2 Vietnam          <NA>
3 Site3   Spain        Europe
4 Site4 Germany        Europe
5 Site5   China          <NA>
Nate
  • 10,361
  • 3
  • 33
  • 40
  • I get the following error `Error in `$<-.data.frame`(`*tmp*`, "Continent", value = logical(0)) : replacement has 0 rows, data has 1000` ##{My original data has 1000 values} – Share Mar 09 '17 at 03:58
  • You have to do `df$Continent <- NA` first probably – thelatemail Mar 09 '17 at 03:58