7

I have some trouble with a script which uses cbind to add columns to a data frame. I select these columns by regular expression and I love that cbind automatically provides a prefix if you add more then one column. Bit this is not working if you just append one column... Even if I cast this column as a data frame...

Is there a way to get around this behaviour?

In my example, it works fine for columns starting with a but not for b1 column.

df <- data.frame(a1=c(1,2,3),a2=c(3,4,5),b1=c(6,7,8))

cbind(df, log=log(df[grep('^a', names(df))]))

cbind(df, log=log(df[grep('^b', names(df))]))

cbind(df, log=as.data.frame(log(df[grep('^b', names(df))])))
Randyka Yudhistira
  • 3,612
  • 1
  • 26
  • 41
drmariod
  • 11,106
  • 16
  • 64
  • 110

3 Answers3

2

A solution would be to create an intermediate dataframe with the log values and rename the columns :

logb = log(df[grep('^b', names(df))]))
colnames(logb) = paste0('log.',names(logb))
cbind(df, logb)
Math
  • 2,399
  • 2
  • 20
  • 22
  • 1
    Works so far, but it is hard to read... Sad that there is no built in solution in cbind. Wonder where exactly the problem is inside cbind or what kind of logic it follows. – drmariod Feb 18 '15 at 09:29
  • To make it easier to read, you can write it as `mycbind <- function(df, pattern)` and simply call it instead of ·`cbind`. – Math Feb 18 '15 at 09:38
  • 1
    You don't need to use `grep` twice here, you can simplify your second row to just `colnames(logb) = paste0('log.', names(logb))` – David Arenburg Feb 18 '15 at 09:40
2

What about

cbw <- c("a","b") # columns beginning with
cbw_pattern <- paste0("^",cbw, collapse = "|")
cbind(df, log=log(df[grep(cbw_pattern, names(df))]))

This way you do select both pattern at once. (all three columns).
Only if just one column is selected the colnames wont fit.

Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • It is specially about only one column selected... the a1 and a2 was just an example that there are other columns as well and it works for more as one columns as expected... so I really search for a "one column solution". – drmariod Feb 18 '15 at 09:28
0

While the OP's use case is probably better solved by switching to a long format, the problem of a missing prefix for named single column arguments to cbind.data.frame persists and is perhaps highlighted better in the following example:

cbind(x=data.frame(a=1,b=2), y=data.frame(a=3, b=4))
#>   x.a x.b y.a y.b
#> 1   1   2   3   4
cbind(x=data.frame(a=1,b=2), y=data.frame(a=3))
#>   x.a x.b a
#> 1   1   2 3

Note, how the a column in the second case lacks the y. prefix. This odd behavior is documented in the value section of ?data.frame (which is called by cbind for data.frame arguments):

How the names of the data frame are created is complex, and the rest of this paragraph is only the basic story. [...] For a named matrix/list/data frame argument with more than one named column, the names of the columns are the name of the argument followed by a dot and the column name inside the argument: if the argument is unnamed, the argument's column names are used. For a named or unnamed matrix/list/data frame argument that contains a single column, the column name in the result is the column name in the argument.

One possible workaround (with several positive side effects) is switching from base data.frame to data.table. It handles the the column names more consistently:

library(data.table)
cbind(x=data.table(a=1,b=2), y=data.frame(a=3))
#>      x.a   x.b   y.a
#>    <num> <num> <num>
#> 1:     1     2     3
jan-glx
  • 7,611
  • 2
  • 43
  • 63