Creating a new column in my data frame of > 900,000 rows to which I want values to be
NA
if any of the values in a set of columns areNA
df$newcol[is.na(df$somecol_1) | is.na(df$somecol_2) | is.na(df$somecol_3)] <- NA
0
if all of the values in a set of columns are0
df$newcol[df$somecol_1==0 & df$somecol_2==0 & df$somecol_3==0] <- 0
1
if any of the values in a set of columns are1
while none isNA
. This is the tricky part as it creates a myriad of combinations with my ten columns. The whole data frame has >50 columns, of which I have ten columns of interest for this procedure and here I present only three:
df$newcol[df$somecol_1==1 & df$somecol_2==0 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==1 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==1 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==1 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==1 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==0 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==0 & df$somecol_3==1] <- 1
I have a feeling I am overthinking this, there must be a way to make 3 easier? Writing different combinations of columns as shown above would take forever with ten. And a loop would go too slow due to the large dataset.
Dummy data:
df <- NULL
df$somecol_1 <- c(1,0,0,NA,0,1,0,NA,1,1)
df$somecol_2 <- c(NA,1,0,0,0,1,0,NA,0,0)
df$somecol_3 <- c(0,0,0,0,0,0,0,0,0,0)
df <- as.data.frame(df)
Based on the above, I want the new column to be
df$newcol <- c(NA,1,0,NA,0,1,0,NA,1,1)