1

I have figured out a very inefficient way of using vectors in an if statement, but can't figure out how to use ifelse() or sapply() or any better way of doing it.

I have the following data:

yes_codes <- c(1,3,7)
yes_year <- 2011
df2 <- data.frame(yes_codes, yes_flags, yes_year)
codes <- c(1:10)
flag <- 'N'
year <- c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010)
df <- data.frame(codes, flag, year)

> df
   codes flag year
1      1    N 2011
2      2    N 2012
3      3    N 2011
4      4    N 2012
5      5    N 2011
6      6    N 2013
7      7    N 2014
8      8    N 2015
9      9    N 2011
10    10    N 2010
> df2
  yes_codes yes_flags yes_year
1         1         Y     2011
2         3         Y     2011
3         7         Y     2011

I need to match the df$code with df$yes_codes and set the df$flag to 'Y' when they match. The only way I have figured out how to do this is very very obviously wrong

for(i in 1:nrow(df)) {
  for(z in 1:nrow(df2)){
    if(df$year[i]==2011 | df$year[i]==2012)
      if(as.character(df$code)==as.character(df2$yes_code[z]))
        if(df$year[i]==df2$yes_year[z])
          df$flag[i] <- 'Y'
  }
}

I know you're supposed to use ifelse() to do vectorized if statements, but this doesn't work either

ifelse(df$year==2011 | df$year==2012, ifelse(df$code==df2$yes_code, 
ifelse(df$year==df2$year, df$flag <- 'Y',
            df$flag <- 'N'), df$flag <- 'N'), df$flag <- 'N')

This sets EVERY flag to 'Y' or 'N' with every iteration and all I get is whatever was set last, which is usually 'N'. I really thought I had found a perfect example of why you use <- and = for different things, but it won't even run when I switch the <- for =.

EDIT:
As Sotos explained to me, ifelse() simply returns a function so I need to set my values outside of it. My problem now is that I actually have several ifelse() conditions that I need to check because for example I have one rule that applies to 2011 and 2012 and another that applies to 2012 and greater. Writing multiple ifelse() statements just overwrites the output of the previous one with the else output when done as follows:

df$flag <- ifelse(df$year==2013 & df$codes==df2$yes_code & df$year==df2$yes_year, 'Y', 'N')
df$flag <- ifelse(df$year >= 2012 & df$codes=='4', 'Y', 'N')
df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code & df$year==df2$year, 'Y', 'N')

It's having to use else that is making this so difficult, is there any other way to use a vectorized if statement?

jamzsabb
  • 1,125
  • 2
  • 18
  • 40
  • 1
    Your `ifelse` statement is wrong. One problem is that you can not use `<-` inside `ifelse`... You can do `df$flag <- ifelse...` – Sotos May 16 '17 at 14:46
  • Thanks! That was what I needed to get moving. I've updated my question with another wall I'm hitting though. – jamzsabb May 16 '17 at 16:18
  • You can also nest `ifelse` statements, `ifelse(condition1, result1, ifelse(condition2, result2, result3))`. You might want to read [alternatives to nested `ifelse` statements](http://stackoverflow.com/q/30494520/903061). But at some point, if you have multiple rules applying to the same line you have to choose which one happens last (you say you have a rule for 2011 and 2012, and another rule for 2012 and greater, which means both these rules apply to 2012). – Gregor Thomas May 16 '17 at 16:33

3 Answers3

1
df3<-merge(df, df2, by.x='codes', by.y='yes_codes',all.x = TRUE)
df3$flag<-ifelse(df3$yes_flags=="Y", "Y", "N")
df3$flag[is.na(df3$flag)]<-"N"
df<-df3[,!(names(df3) %in% names(df2))]
  • Thanks for the reply I thought about using merge() instead too but I need to be able to apply rules from multiple ifelse() statements. The example I provided may have been too simplified. – jamzsabb May 16 '17 at 15:54
  • Sure. You can stack if ifelse statements so they don't write over values: val0<-ifelse(cond, "val1",NA) val0<-ifelse(newcond, "val2",val0) –  May 16 '17 at 16:42
1

Here is a solution with data.table:

library("data.table")
dt2 <- data.table(yes_codes=c(1,3,7), yes_flags='Y',yes_year=2011)
dt  <- data.table(codes=(1:10), flag='N', year=c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010))

dt[dt2, on=c(codes="yes_codes", year="yes_year"), in.df2:=i.yes_flags]

dt[year==2013 & in.df2=='Y', flag:='Y']
dt[year>=2012 & codes==4, flag:='Y']
dt[(year==2011 | year==2012) & in.df2=='Y', flag:='Y']
dt
#    codes flag year in.df2
# 1:     1    Y 2011      Y
# 2:     2    N 2012     NA
# 3:     3    Y 2011      Y
# 4:     4    Y 2012     NA
# 5:     5    N 2011     NA
# 6:     6    N 2013     NA
# 7:     7    N 2014     NA
# 8:     8    N 2015     NA
# 9:     9    N 2011     NA
# 10:    10    N 2010     NA

or you can do it in one big condition:

dt[(year==2013 & in.df2=='Y') | (year>=2012 & codes==4) | 
               ((year==2011 | year==2012) & in.df2=='Y'), flag:='Y']

you can put the first and the third condition together:

dt[((year==2011 | year==2012 | year==2013) & in.df2=='Y') | (year>=2012 & codes==4), flag:='Y']
# and shorten it:
dt[((year %in% 2011:2013) & in.df2=='Y') | (year>=2012 & codes==4), flag:='Y']
jogo
  • 12,469
  • 11
  • 37
  • 42
0

To summarize the info I got in this thread, the answer to my first problem was 'don't try to set values inside of an ifelse(), use ifelse() to return a value and set it that way.

The second problem I was having with the else portion of my statement overwriting previous statements, the answer was maddeningly simple: just return the current value. So the following

df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code &
df$year==df2$year, 'Y', 'N')

becomes this

df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code &
df$year==df2$year, 'Y', df$flag)

Thanks to all who helped, this was a very difficult question to articulate.

jamzsabb
  • 1,125
  • 2
  • 18
  • 40
  • 1
    Is your code working with `df$code==df2$yes_code`? From your question: `df` and `df2` have different number of rows. – jogo May 17 '17 at 07:37
  • @jogo yea it seems to be working, I used an rpivottable to verify that rules had been applied like I was expecting and it seems to have worked. I don't think the number of rows matching was important. Would it be better to use` %in%` or something? – jamzsabb May 17 '17 at 13:13