Combine two columns of df in a list, one numerical one ch

Question

I have dataframes in a list. For each data frame I need to combine two columns, A and B. One is numerical, and the other is not. I was trying to write a function for this purpose it doesn't work

df <- data.frame("Column A"=c(1), "Column B"=c(2), 
  "Column C"=c(3), "Sample_ID"=c("abc"), "Sample_ID2"=c(123))

my_function <- function(fn){
  new <- do.call(paste0, final_list1[c("Sample_ID", "Sample_ID2")]) 
  return(new)
}

final_list2 <- lapply(list(df), my_function)

Also, unite() from tidy verse doesn't work.

A dataframe in my list looks like:

Column A	Column B	Column C	Sample_ID	Sample_ID2
1	2	3	abc	123

my desired output is:

Column A	Column B	Column C	Sample_ID+Sample_ID2
1	2	3	abc123

Where is `fn` argument used inside the `my_function`? I would assume you want `final_list` as an argument ie. `my_function <- function(final_list, colnms) {final_list[[paste(colnms, collapse = "+")]] <- do.call(paste0, final_list[colnms]); final_list}` — akrun, Jul 06 '22 at 16:17
You can do it very easily by writing a function and using `tidyr::unite()` inside that function. Would you please show your code how you have tried with `unite()` — shafee, Jul 06 '22 at 16:44
Unite can't combine columns with different natures, that's the error I get — Sara Angela Gallarati, Jul 06 '22 at 17:06
Error in UseMethod("unite") : no applicable method for 'unite' applied to an object of class "list" — Sara Angela Gallarati, Jul 06 '22 at 22:06

score 0 · Answer 1 · answered Jul 06 '22 at 17:51

0

Option using transpose from purrr which transposes a list of vectors to a list like this:

fn <- data.frame(A = 1,
                 B = 2,
                 C = 3,
                 Sample_ID = "abc",
                 Sample_ID2 = "123")

library(data.table)
setDT(fn)[, `Sample_ID+Sample_ID2` := purrr::transpose(.("Sample_ID","Sample_ID2"))][,c("Sample_ID", "Sample_ID2"):=NULL]
fn
#>    A B C Sample_ID+Sample_ID2
#> 1: 1 2 3            <list[2]>

^{Created on 2022-07-06 by the reprex package (v2.0.1)}

answered Jul 06 '22 at 17:51

Quinten

35,235
5
20
53

Warning messages: 1: In setDT(final_list1) : Some columns are a multi-column type (such as a matrix column): [1, 2, 3]. setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column. 2: In `[.data.table`(setDT(final_list1)[, `:=`(`Sample_ID+Sample_ID2`, : Column 'Sample_ID' does not exist to remove 3: In `[.data.table`(setDT(final_list1)[, `:=`(`Sample_ID+Sample_ID2`, : Column 'Sample_ID2' does not exist to remove. Because fn is a list? – Sara Angela Gallarati Jul 06 '22 at 19:02
Hi @SaraAngelaGallarati, could you please share your data using dput(fn)? – Quinten Jul 06 '22 at 19:08
It's too long. What info do you need? – Sara Angela Gallarati Jul 06 '22 at 21:37

score 0 · Accepted Answer · answered Jul 06 '22 at 23:37

0

myfunction<-function(fn){
  new<-unite(fn, col='SampleID', c('SampleID', 'Sample_ID2'), sep='-')
  return(new)
}
final_list2<-lapply(final_list1,myfunction)

Solved it

answered Jul 06 '22 at 23:37

Sara Angela Gallarati

61
5

Stereo · Answer 3 · 2022-07-08T08:42:41.503

You can probably solve with plain vanilla R function paste():

# Create ops' dummy data frames
df <- data.frame("Column A"=c(1,2), 
  "Column B"=c(2,3), "Column C"=c(3,4), 
  "Sample_ID"=c("abc", "def"), 
  "Sample_ID2"=c(123, 456))

paste_concat <- function(x) {
  # Paste the two columns together without a space as per desired output
  x["Sample_ID+Sample_ID2"] <- paste(x$Sample_ID, x$Sample_ID2, sep="")
  return(x)
}

print(lapply(list(df, df), paste_concat))

> [[1]]
>  Column.A Column.B Column.C Sample_ID Sample_ID2   test
> 1        1        2        3       abc        123 abc123
> 2        2        3        4       def        456 def456
>
> [[2]]
>  Column.A Column.B Column.C Sample_ID Sample_ID2   test
> 1        1        2        3       abc        123 abc123
> 2        2        3        4       def        456 def456

To see whether paste() performs better than tidyr::unite(), I tried a little benchmark:

install.packages(c("rbenchmark", "tidyr"))
library("rbenchmark")
library("tidyr")

paste_concat <- function(x) {
  x["Sample_ID+Sample_ID2"] <- paste(x$Sample_ID, x$Sample_ID2, sep="")
  return(x)
}

unity_concat <- function(x) {
  # Removing the '-' to be consistent with ops original expected output
  y <- unite(x, col='Sample_ID+Sample_ID2', c('Sample_ID', 'Sample_ID2'), sep='')
  return(y)
}

benchmark(lapply(list(df, df), paste_concat),
          lapply(list(df, df), unity_concat),
          replications=1000,
          columns=c('test', 'elapsed', 'replications'))

>                                 test elapsed replications
> 1    lapply(list(df, df), my_concat)   0.113         1000
> 2 lapply(list(df, df), unity_concat)   3.743         1000

Based on these results, I'd say that it looks like the function based on paste() is 33 times faster!

Combine two columns of df in a list, one numerical one ch

3 Answers3