Can I use a variable to identify a column name in an R dataframe?

Question

Let's say I have a df with many columns and 10 of those columns are named "date1" to "date10"
I want to do something like:

for (x in 1:total_number_of_rows) {
   for (y in 1:10) {
      column_variable_name <- paste0("date",y)
      if (df$column_variable_name[x] <= df$another_date_column[x]) {
         *lots more code here* }
   }
}

Right now what I'm doing is:

for (x in 1:total_number_of_rows) {
   if (df$date1[x] <= df$another_date_column[x]) {
      *lots more code here* }
   if (df$date2[x] <= df$another_date_column[x]) {
      *lots more code here* }
   if (df$date3[x] <= df$another_date_column[x]) {
      *lots more code here* }
   if (df$date4[x] <= df$another_date_column[x]) {
      *lots more code here* }
   if (df$date5[x] <= df$another_date_column[x]) {
      *lots more code here* }
   etc...
}

And the "lots more code here" is all the same code each time.
The goal is to not have to copy and paste the code 10 times just because I need to change the variable name in the if statement. The first set of code above doesn't work because it's looking for a column in the dataframe called "column_variable_name". Is there any way to do this? What I'm doing in the second set of code seems unnecessary.

what is `another variable` `if/else` works on a single row. Are you looking for `all` values less than the another variable — akrun, Aug 01 '20 at 00:07
Really need a reproducible example, not clear what you're asking. You don't want to use a for loop to mutate a set of dataframe columns, but you *could* try: if (df[column_variable_name] <= *another variable*) if you need something quick — Bill O'Brien, Aug 01 '20 at 00:14
I was trying to keep the example simple. It's basically a large dataframe with a set of consecutively numbered column names that are dates. I'm then comparing them to another set of date columns and conducting other tasks if one date is smaller than the other. Let me rewrite the code to something more realistic to what I'm doing to see if that will help. — hojoko, Aug 01 '20 at 00:25
Are you trying to do a one-to-one comparison or a one-to-many comparison? — Michelle, Aug 01 '20 at 05:00
This is a tempting way to go, but you should really never be renaming your variables sequentially like this, or usin paste, etc etc. I have an answer about this using python as the example, but the basic ideas are the same: https://stackoverflow.com/questions/57546321/python-change-variable-suffix-with-for-loop/57546424#57546424 Your `date1..daten` columns should be stored in an iterable variable of their own and not in `n` different variables. — beroe, Aug 01 '20 at 07:15
I think I understand your comment and associated link, however, I'm not working in a situation of best practices and the data I'm working with is from a giant excel file I've read in. The Python example makes sense to use "test[i]" but the variable is just that, a lone variable. In my example, I'm working with a dataframe so I don't think that "df$test[i]" would work because it would be looking for a column named "test" within df. — hojoko, Aug 03 '20 at 14:05

score 2 · Answer 1 · answered Aug 01 '20 at 00:14

If we need to identify the fruit columns having all values less than or equal to a particular column

nm1 <- startsWith(df, "fruit")
nm1[!colSums(df[nm1] > df[["anothervariable"]], na.rm = TRUE)]

Or another option is Reduce

nm1[Reduce(`|`, lapply(df[nm1], `>=`, df[["anothervariable"]]))]

Can I use a variable to identify a column name in an R dataframe?

1 Answers1