You can probably solve with plain vanilla R function paste()
:
# Create ops' dummy data frames
df <- data.frame("Column A"=c(1,2),
"Column B"=c(2,3), "Column C"=c(3,4),
"Sample_ID"=c("abc", "def"),
"Sample_ID2"=c(123, 456))
paste_concat <- function(x) {
# Paste the two columns together without a space as per desired output
x["Sample_ID+Sample_ID2"] <- paste(x$Sample_ID, x$Sample_ID2, sep="")
return(x)
}
print(lapply(list(df, df), paste_concat))
> [[1]]
> Column.A Column.B Column.C Sample_ID Sample_ID2 test
> 1 1 2 3 abc 123 abc123
> 2 2 3 4 def 456 def456
>
> [[2]]
> Column.A Column.B Column.C Sample_ID Sample_ID2 test
> 1 1 2 3 abc 123 abc123
> 2 2 3 4 def 456 def456
To see whether paste()
performs better than tidyr::unite()
, I tried a little benchmark:
install.packages(c("rbenchmark", "tidyr"))
library("rbenchmark")
library("tidyr")
paste_concat <- function(x) {
x["Sample_ID+Sample_ID2"] <- paste(x$Sample_ID, x$Sample_ID2, sep="")
return(x)
}
unity_concat <- function(x) {
# Removing the '-' to be consistent with ops original expected output
y <- unite(x, col='Sample_ID+Sample_ID2', c('Sample_ID', 'Sample_ID2'), sep='')
return(y)
}
benchmark(lapply(list(df, df), paste_concat),
lapply(list(df, df), unity_concat),
replications=1000,
columns=c('test', 'elapsed', 'replications'))
> test elapsed replications
> 1 lapply(list(df, df), my_concat) 0.113 1000
> 2 lapply(list(df, df), unity_concat) 3.743 1000
Based on these results, I'd say that it looks like the function based on paste()
is 33 times faster!