I have multiple response data that has been split into separate columns with cSplit_e into a format like this...
ID Response IM2 IM4 ... IM10 IM16
1 1 4,7,10 NA 1 1 NA
2 2 7,5,16,8 NA NA NA 1
3 3 2,10 1 NA 1 NA
I'm trying to set up a function will check each row to see if a subset of columns contains at least one "1". It will then create a new column, setting it to "1" if a row had at least one "1" in the specified columns.
Previously I have done this by writing out a for loop for each column I want to create, like so...
parade$q9PaperAggregate <- NA
parade$q9MagazineAggregate <- NA
#Newspaper Aggregate Loop
for (i in 1:nrow(parade)) { #Starts loop setting i to each row number
if (is.na(parade$q9PaperAds[i]) == FALSE | ##These three lines check each row is not all NA
is.na(parade$q9PaperCircs[i]) == FALSE |
is.na(parade$q9PaperWebAds[i]) == FALSE) {
parade$q9PaperAggregate[i] <- 1 #Sets agg cell value to 1 if not all NA for each i
}
}
#Magazine Aggregate Loop
for (i in 1:nrow(parade)) {
if (is.na(parade$q9MagazineAds[i]) == FALSE |
is.na(parade$q9MagazineWebAds[i]) == FALSE) {
parade$q9MagazineAggregate[i] <- 1
}
}
This works, but is clearly inefficient. I want to create a general function that does this for inputs. Here is what I have so far:
#df = object; n = new column name; col = vector of columns I want to check
atleastone <- function(df, n, col) {
#n = new column name (will run over list of vector - new col names with the old columns you want to agg)
df[n] <- NA
for (i in 1:nrow(df)) { #Starts loop setting i to each row number
if (df[i, col] == 1) {
(df[n])[i] <- 1 #Sets new column cell value to 1 if not all NA for each i
}
}
}
My main two issues are 1) how to run the for loop to check multiple columns for the value if the number of columns to be checked can vary and 2) how to pass the row and column to subset. Currently "col" uses the actual name of the column while "i" just takes the numerical row value. This was fine in the format I used before of...
df$column[i]
...but the $ operator doesn't seem to work with values being passed to it from a function.
Any idea what I'm doing wrong here? Is there a better way to do this?
Thank you for your time.
EDIT:
I turned @SymbolixAU's response into a function, like so:
#Aggregate Function
#takes input df = object; n = name of new column in double quotes; l = columns you want to agg
agger <- function(df, l, n) {
#checks if the sum of the rows in the specified columns is greater than 1
#this produces a logical value which is multiplied by 1 to change it to numeric
df[n] <- ((rowSums(df[, l] == 1) > 0) * 1)
}
Follow-up question - I am trying to use mapply to pass a list "x" of two different vectors of columns to the argument "l" and a vector "y" of two names for the two new columns that will be created and the target object df = BR. The command looks like this:
mapply(agger, l = x, n = y, MoreArgs = list(BR))
This is sending me to the debug window with no messages or info on what is going wrong. Is my mapply set up incorrectly and/or is there a better way to run this function on multiple groups of columns in the same dataframe?
Thank you.