0

I have dataframes in a list. For each data frame I need to combine two columns, A and B. One is numerical, and the other is not. I was trying to write a function for this purpose it doesn't work

df <- data.frame("Column A"=c(1), "Column B"=c(2), 
  "Column C"=c(3), "Sample_ID"=c("abc"), "Sample_ID2"=c(123))

my_function <- function(fn){
  new <- do.call(paste0, final_list1[c("Sample_ID", "Sample_ID2")]) 
  return(new)
}

final_list2 <- lapply(list(df), my_function)

Also, unite() from tidy verse doesn't work.

A dataframe in my list looks like:

Column A Column B Column C Sample_ID Sample_ID2
1 2 3 abc 123

my desired output is:

Column A Column B Column C Sample_ID+Sample_ID2
1 2 3 abc123
Stereo
  • 1,148
  • 13
  • 36

3 Answers3

0

Option using transpose from purrr which transposes a list of vectors to a list like this:

fn <- data.frame(A = 1,
                 B = 2,
                 C = 3,
                 Sample_ID = "abc",
                 Sample_ID2 = "123")

library(data.table)
setDT(fn)[, `Sample_ID+Sample_ID2` := purrr::transpose(.("Sample_ID","Sample_ID2"))][,c("Sample_ID", "Sample_ID2"):=NULL]
fn
#>    A B C Sample_ID+Sample_ID2
#> 1: 1 2 3            <list[2]>

Created on 2022-07-06 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Warning messages: 1: In setDT(final_list1) : Some columns are a multi-column type (such as a matrix column): [1, 2, 3]. setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column. 2: In `[.data.table`(setDT(final_list1)[, `:=`(`Sample_ID+Sample_ID2`, : Column 'Sample_ID' does not exist to remove 3: In `[.data.table`(setDT(final_list1)[, `:=`(`Sample_ID+Sample_ID2`, : Column 'Sample_ID2' does not exist to remove. Because fn is a list? – Sara Angela Gallarati Jul 06 '22 at 19:02
  • Hi @SaraAngelaGallarati, could you please share your data using dput(fn)? – Quinten Jul 06 '22 at 19:08
  • It's too long. What info do you need? – Sara Angela Gallarati Jul 06 '22 at 21:37
0
myfunction<-function(fn){
  new<-unite(fn, col='SampleID', c('SampleID', 'Sample_ID2'), sep='-')
  return(new)
}
final_list2<-lapply(final_list1,myfunction)

Solved it

0

You can probably solve with plain vanilla R function paste():

# Create ops' dummy data frames
df <- data.frame("Column A"=c(1,2), 
  "Column B"=c(2,3), "Column C"=c(3,4), 
  "Sample_ID"=c("abc", "def"), 
  "Sample_ID2"=c(123, 456))

paste_concat <- function(x) {
  # Paste the two columns together without a space as per desired output
  x["Sample_ID+Sample_ID2"] <- paste(x$Sample_ID, x$Sample_ID2, sep="")
  return(x)
}

print(lapply(list(df, df), paste_concat))

> [[1]]
>  Column.A Column.B Column.C Sample_ID Sample_ID2   test
> 1        1        2        3       abc        123 abc123
> 2        2        3        4       def        456 def456
>
> [[2]]
>  Column.A Column.B Column.C Sample_ID Sample_ID2   test
> 1        1        2        3       abc        123 abc123
> 2        2        3        4       def        456 def456

To see whether paste() performs better than tidyr::unite(), I tried a little benchmark:

install.packages(c("rbenchmark", "tidyr"))
library("rbenchmark")
library("tidyr")

paste_concat <- function(x) {
  x["Sample_ID+Sample_ID2"] <- paste(x$Sample_ID, x$Sample_ID2, sep="")
  return(x)
}

unity_concat <- function(x) {
  # Removing the '-' to be consistent with ops original expected output
  y <- unite(x, col='Sample_ID+Sample_ID2', c('Sample_ID', 'Sample_ID2'), sep='')
  return(y)
}

benchmark(lapply(list(df, df), paste_concat),
          lapply(list(df, df), unity_concat),
          replications=1000,
          columns=c('test', 'elapsed', 'replications'))

>                                 test elapsed replications
> 1    lapply(list(df, df), my_concat)   0.113         1000
> 2 lapply(list(df, df), unity_concat)   3.743         1000

Based on these results, I'd say that it looks like the function based on paste() is 33 times faster!

Stereo
  • 1,148
  • 13
  • 36