I have figured out a very inefficient way of using vectors in an if
statement, but can't figure out how to use ifelse()
or sapply()
or any better way of doing it.
I have the following data:
yes_codes <- c(1,3,7)
yes_year <- 2011
df2 <- data.frame(yes_codes, yes_flags, yes_year)
codes <- c(1:10)
flag <- 'N'
year <- c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010)
df <- data.frame(codes, flag, year)
> df
codes flag year
1 1 N 2011
2 2 N 2012
3 3 N 2011
4 4 N 2012
5 5 N 2011
6 6 N 2013
7 7 N 2014
8 8 N 2015
9 9 N 2011
10 10 N 2010
> df2
yes_codes yes_flags yes_year
1 1 Y 2011
2 3 Y 2011
3 7 Y 2011
I need to match the df$code
with df$yes_codes
and set the df$flag
to 'Y' when they match. The only way I have figured out how to do this is very very obviously wrong
for(i in 1:nrow(df)) {
for(z in 1:nrow(df2)){
if(df$year[i]==2011 | df$year[i]==2012)
if(as.character(df$code)==as.character(df2$yes_code[z]))
if(df$year[i]==df2$yes_year[z])
df$flag[i] <- 'Y'
}
}
I know you're supposed to use ifelse()
to do vectorized if
statements, but this doesn't work either
ifelse(df$year==2011 | df$year==2012, ifelse(df$code==df2$yes_code,
ifelse(df$year==df2$year, df$flag <- 'Y',
df$flag <- 'N'), df$flag <- 'N'), df$flag <- 'N')
This sets EVERY flag to 'Y' or 'N' with every iteration and all I get is whatever was set last, which is usually 'N'. I really thought I had found a perfect example of why you use <-
and =
for different things, but it won't even run when I switch the <-
for =
.
EDIT:
As Sotos explained to me, ifelse()
simply returns a function so I need to set my values outside of it. My problem now is that I actually have several ifelse()
conditions that I need to check because for example I have one rule that applies to 2011 and 2012 and another that applies to 2012 and greater. Writing multiple ifelse()
statements just overwrites the output of the previous one with the else
output when done as follows:
df$flag <- ifelse(df$year==2013 & df$codes==df2$yes_code & df$year==df2$yes_year, 'Y', 'N')
df$flag <- ifelse(df$year >= 2012 & df$codes=='4', 'Y', 'N')
df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code & df$year==df2$year, 'Y', 'N')
It's having to use else
that is making this so difficult, is there any other way to use a vectorized if
statement?