0

I have a dataframe, df, with several columns in it. I would like to create a function to create new columns dynamically using existing column names. Part of it is using the last four characters of an existing column name. For example, I would like to create a variable names df$rev_2002 like so:

df$rev_2002 <- df$avg_2002 * df$quantity

The problem is I would like to be able to run the function every time a new column (say, df$avg_2003) is appended to the dataframe.

To this end, I used the following function to extract the last 4 characters of the df$avg_2002 variable:

substRight <- function (x,n) {
  substr(x, nchar(x)-n+1, nchar(x))
}

I tried putting together another function to create the columns:

revved <- function(x, y, z){
  z = x * y
  names(z) <- paste('revenue', substRight(x,4), sep = "_")
  return x
}

But when I try it on actual data I don't get new columns in my df. The desired result is a series of variables in my df such as:

df$rev_2002, df$rev_2003...df$rev_2020 or whatever is the largest value of the last four characters of the x variable (df$avg_2002 in example above).

Any help or advice would be truly appreciated. I'm really in the woods here.

jvalenti
  • 604
  • 1
  • 9
  • 31
  • 1
    Hello, could you show how you are using `revved` with a small example data set?. Also, an easy way to programmatically make new columns with strings is the `[[` operator. – Justin Landis May 06 '21 at 20:34

1 Answers1

1
dat <- data.frame(id = 1:2, quantity = 3:4, avg_2002 = 5:6, avg_2003 = 7:8, avg_2020 = 9:10)
func <- function(dat, overwrite = FALSE) {
  nms <- grep("avg_[0-9]+$", names(dat), value = TRUE)
  revnms <- gsub("avg_", "rev_", nms)
  if (!overwrite) revnms <- setdiff(revnms, names(dat))
  dat[,revnms] <- lapply(dat[,nms], `*`, dat$quantity)
  dat
}

func(dat)
#   id quantity avg_2002 avg_2003 avg_2020 rev_2002 rev_2003 rev_2020
# 1  1        3        5        7        9       15       21       27
# 2  2        4        6        8       10       24       32       40
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • What if there are other columns in the data frame? – jvalenti May 06 '21 at 21:59
  • 1
    They aren't affected. The new columns are appended. Please try it! I promise this function is only additive, and only when specific conditions are met. If no `avg_*` columns exist, nothing is added/changed. (Okay, not 100% ... I just added the `overwrite=` argument. With the default of `FALSE`, existing `rev_*` columns will not be touched. That's the only "change" part of the function.) – r2evans May 06 '21 at 22:03
  • this is great...one more question: how would one use a vector of column names and substitutions? – jvalenti May 10 '21 at 19:26
  • I'm not sure what you're asking. In the function `nms` is a vector of column names, and `revnms` are the new (to be added) column names. – r2evans May 10 '21 at 19:51
  • My mistake... I am referring to the possibility to use the function on data frames other than `dat`. For example, the second to last line is in `func` uses `dat$quantity`. Would I have to redefine the function to use on a different data frame with different column names because it explicitly references `dat$quantity`? I'm just trying to think of a way to generalize it I guess. So my question is really about the second to last line. – jvalenti May 11 '21 at 14:28
  • 1
    The function works on whatever data.frame you send to it. Whatever the frame is named *outside* the function does not matter. If you were to do `func(mtcars)`, all references inside the function will see the data as `dat`. If there exists `dat` outside of the function, that frame is completely different than what the functions sees inside. If you call `func(dat)`, the fact that `dat$quantity` is referenced inside is coincidentally the same. – r2evans May 11 '21 at 14:55
  • 1
    If instead you mean that the name to use as the quantity changes, then change the function definition to be `function(dat, overwrite=FALSE, value.var="quantity")` and then change `dat$quantity` to `dat[[value.var]]`. From there, if you have a frame with a different base-value field, call `func(otherdat, value.var="otherfield")`, and it should work similarly. – r2evans May 11 '21 at 14:56