How to substitute "Ethiopia" with "Ethiopia (-1992)" and "Ethiopia (1993-)" based on the year

Question

I am trying to substitute "Ethiopia" in location_1 with "Ethiopia (-1992)" if location_1 says "Ethiopia" and the years correspond to all years up to and including 1992 and with "Ethiopia (1993-)" if location_1 says "Ethiopia" and the years correspond to all years from 1993 forward.

Unfortunately, the code I came up with substitutes all with "Ethiopia (-1992)" even for those years after 1992.

The following is the code:

if (mydata$year >= 1992) {
  mydata$location_1 <- sub("Ethiopia", "Ethiopia (-1992)", mydata$location_1)
} else mydata$location_1 <- sub("Ethiopia", "Ethiopia (1993-)", mydata$location_1)

I was hoping that I would have all "Ethiopia" turned into either "Ethiopia (-1992)" or "Ethiopia (1993-)" based on the year. Instead, the results are that all "Ethiopia" become "Ethiopia (-1992)".

Assuming that your `year` column contains values >=1992, the error lies in your `if` condition. You are converting all the values whose corresponding `year` fall under `>= 1992` condition to `"Ethiopia (-1992)"` and the ones which don't follow this condition (else block) are getting converted to `"Ethiopia (-1993)"`. This is completely opposite to what you stated in your question. — Argon, Jun 30 '19 at 01:20
Can you share some of your data. Read here to know how: [How to make a reproducible example in r?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — M--, Jun 30 '19 at 01:21

score 2 · Answer 1 · answered Jun 30 '19 at 02:39

You can replace the column in the subset of your data:

mydata[which(mydata$location_1=="Ethiopia" & mydata$year <= 1992), 
      "location1"] <- "Ethiopia (-1992)"

mydata[which(mydata$location_1=="Ethiopia" & mydata$year >  1992), 
       "location1"] <- "Ethiopia (1993-)"

Or use dplyr:

library(dplyr)
df1 %>% 
  mutate(location_1=case_when(location_1=="Ethiopia" & year <= 1992 ~ "Ethiopia (-1992)",
                              location_1=="Ethiopia" & year > 1992 ~ "Ethiopia (1993-)",
                              TRUE ~ location_1))

`which` is superfluous here; you can subset by a logical vector directly — alistaire, Jun 30 '19 at 07:21

score 0 · Answer 2 · answered Jun 30 '19 at 05:04

a data.table approach. data.table is a very fast package, check ?data.table for details:

mydata[location_1 == "Ethiopia" & !is.na(year), 
       location1 := ifelse(year <= 1992, 
                           "Ethiopia (-1992)", 
                           "Ethiopia (1993-)")

What is in there:

mydata[location_1 == "Ethiopia" & !is.na(year), filters all the rows in which the location_1 is Ethiopia and there's a year (we don't want to wrongly assign a name for non available years).

location1 := is an assign call (:= is the assign operator)

ifelse(year <= 1992, x, y) returns x if condition is TRUE, and y otherwise.

score 0 · Answer 3 · answered Jun 30 '19 at 08:44

The sort of if-else condition you are using should be in an iterative loop. A for loop, for example:

for (i in 1:nrow(mydata)){
    if (mydata$location_1[i] == "Ethiopia") {
        if (mydata$year[i] <= 1992) mydata$location_1[i] <- "Ethiopia (-1992)"
        else mydata$location_1[i] <- "Ethiopia (1993-)"
    }
}

#### OUTPUT ####

   year       location_1
1  1994          Germany
2  1998          Germany
3  1993 Ethiopia (1993-)
4  1982          Germany
5  1989            China
6  1997 Ethiopia (1993-)
7  2001            China
8  1990            China
9  1984 Ethiopia (-1992)
10 1999 Ethiopia (1993-)

You can achieve the same goal somewhat more compactly (and perhaps a little faster) using vectorized function ifelse:

mydata$location_1 <- ifelse(mydata$location_1 == "Ethiopia",
       ifelse(mydata$year <= 1992, "Ethiopia (-1992)", "Ethiopia (1993-)"),
       mydata$location_1
       )

Personally, I would probably just create a new variable with the country name followed by (-1992) or (1993-). It is syntactically compact, comparatively fast, and all information is maintained, which can be useful for later subsetting:

mydata$cy <- paste(mydata$location_1, ifelse(mydata$year <= 1992,
                                             "(-1992)", "(1993-)"
                                             ))

#### OUTPUT ####

   year location_1               cy
1  1994    Germany  Germany (1993-)
2  1998    Germany  Germany (1993-)
3  1993   Ethiopia Ethiopia (1993-)
4  1982    Germany  Germany (-1992)
5  1989      China    China (-1992)
6  1997   Ethiopia Ethiopia (1993-)
7  2001      China    China (1993-)
8  1990      China    China (-1992)
9  1984   Ethiopia Ethiopia (-1992)
10 1999   Ethiopia Ethiopia (1993-)

Data:

set.seed(123)

mydata <- data.frame(year = sample(1980:2004, 10, T),
                     location_1 = sample(c("Ethiopia", "Germany", "China"), 10, T),
                     stringsAsFactors = F
                     )

How to substitute "Ethiopia" with "Ethiopia (-1992)" and "Ethiopia (1993-)" based on the year

3 Answers3

What is in there:

Data: