3

I have a data frame in which the names of the columns are something like a,b,v1,v2,v3...v100. I want to create a new column that applies a function to only the columns whose names include 'v'.

For example, given this data frame

df<-data.frame(a=rnorm(3),v1=rnorm(3),v2=rnorm(3),v3=rnorm(3))

I want to create a new column in which each element is the sum of the elements of v1, v2 and v3 that are in the same row.

danilinares
  • 1,172
  • 1
  • 9
  • 28

3 Answers3

6

grep on names to get the column positions, then use rowSums:

rowSums(df[,grep("v",names(df))])
James
  • 65,548
  • 14
  • 155
  • 193
  • 1
    Use `df[grep("v",names(df))]` to avoid conversion to vector if only one column is selected. Compare `df[,"v1"]` vs `df["v1"]`. – Marek Sep 13 '11 at 13:03
3

To combine both @James's and @Anatoliy's answers,

apply(df[grepl('^v', names(df))], 1, sum)

I went ahead and anchored the v in the regular expression to the beginning of the string. Other examples haven't done that but it appears that you want all columns that begin with v not the larger set that may have a v in their name. If I am wrong you could just do

apply(df[grepl('v', names(df))], 1, sum)

You should avoid using subset() when programming, as stated in ?subset

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.

Also, as I learned yesterday from Richie Cotton, when indexing it is better to use grepl than grep.

Community
  • 1
  • 1
adamleerich
  • 5,741
  • 2
  • 18
  • 20
2

That should do:

df$sums<- rowSums(subset(df, select=grepl("v", names(df))))

For a more general approach:

apply(subset(df, select=grepl("v", names(df))), 1, sum)
Anatoliy
  • 1,350
  • 9
  • 9