0

Let's say in R I have a data frame (called df) with a bunch of columns containing integer data named "Var1foo", "Var2foo", and so on.

Now suppose I want to create a new column called sum1 that adds up everything between "Var3foo" and "Var6foo". I might do:

df$sum1 <- rowSums(df[Var3foo:Var6foo])

Or, I might do something a bit more complicated and create a new column called foobar with apply() like so:

eenie = 3
meenie = 2
df$foobar <- apply(df, 1, function(x) if (sum(x[Var2foo:Var7foo]) == eenie & sum(x[1:Var3foo]) != meenie) 1 else 0)

The problem is I always have to explicitly write out the column names or index when referring to those columns. What if I want to refer to column "Varxfoo" where x <- 8 or "Varyfoo" where y <- 12?

What I mean is, I wouldn't be able to do df$paste0("Var", x, "foo") or sum(x[paste0("Var", x, "foo"):paste0("Var", y, "foo")]).

I also considered using dplyr::mutate() to create df$sum1 and df$foobar but it seems to also need explicit column (variable) names.

What should I do? Thanks!!

hpy
  • 1,989
  • 7
  • 26
  • 56
  • 1
    Can't use `$` with string column names, but you can use `[`. See the R-FAQ [Dynamically select data frames column names with $](https://stackoverflow.com/q/18222286/903061). – Gregor Thomas Jun 05 '17 at 18:50

2 Answers2

1

Maybe you could refer the column with

df[paste0("Var", x, "foo")]

If you keep using such things a lot, you could use some function to reduce your work,

int2name <- function(x, prefix = "", suffix = ""){
    paste0(prefix, x, suffix)
}

And then you can use:

df[int2name(2:4, prefix = "Var", suffix = "foo")]
Consistency
  • 2,884
  • 15
  • 23
  • Thanks! This might work, but to create a new column without using "$" I only know of `dplyr::mutate()`. However, `mutate()` doesn't seem to like me using `paste0()` to create the new column's name... (`mutate_()` also didn't work) What can I do? Thanks! – hpy Jun 05 '17 at 19:24
  • 1
    @hpy I think `df[int2name(2:4, prefix = "Var", suffix = "foo")] <- something` can also create new columns, but not as elegant as `mutate`. – Consistency Jun 05 '17 at 19:34
1

A simple solution would be directly referencing the columns, with

sum(df[,x:y])

Of course this only works if the columns are in order.

Eldioo
  • 522
  • 5
  • 11