0

This is a very similar (but slightly different) question to this: Paste element name onto columns in each list element So I understand if this gets flagged as a duplicate.

I am trying to use deparse() and paste() as part of a function that does more than just rename columns, and I'm curious why the rest of the function works fine but deparse() and paste() behave differently when mapped over dfs in a list.

I have a list of unique dataframes of varying length with non-unique column names (all columns are named "barcode" and "count"). I've named the list elements like so:

list1 <- list(ute1, dec1, dec2, dec3, dec4, dec5, dec6, dec7, dec8, pla1, pla2, pla3, pla4, pla5, pla6, pla7, pla8)
names <- c("ute1", "dec1", "dec2", "dec3", "dec4", "dec5", "dec6", "dec7", "dec8",
           "pla1", "pla2", "pla3", "pla4", "pla5", "pla6", "pla7", "pla8")
names(list1) <- names
> list1
$ute1
# A tibble: 33 × 2
   barcode                           count
   <chr>                             <dbl>
 1 CTGACAGTTACCGTTACAGCAGCCACGCTTCTG  1589
 2 CTGACGGTAACAGTGACAGCAGCAACCCTACTT   252
 3 CTCACCGTAACAGTAACGGCGGCGACCCTGCTG   145
 4 CTAACTGTAACCGTCACTGCGGCAACTCTGCTC   137
 5 CTCACGGTGACCGTAACCGCAGCGACTCTGCTC   136
 6 CTGACAGTTACTGTCACTGCTGCTACCCTCCTA    98
 7 CTTACTGTCACCGTTACCGCAGCAACACTACTT    95
 8 CTTACGGTCACGGTAACTGCAGCTACCCTTCTA    51
 9 CTTACAGTTACCGTTACAGCAGCCACACTCCTG    50
10 CTGACTGTTACGGTTACAGCAGCAACCCTACTC    44
# … with 23 more rows
# ℹ Use `print(n = ...)` to see more rows

I am trying to map the following function over list1:

barcode_frequency <- function(x) {
  df_name <- deparse(substitute(x))
  totalumis <- sum(x$count)
  x$freq <- ((x$count)/totalumis)
  colnames(x) <- paste(colnames(x), df_name, sep = "_")
  print(x)
}

list2 <- map(list1, barcode_frequency)

This function works perfectly when applied to an individual dataframe:

> barcode_frequency(ute1)
# A tibble: 33 × 3
   barcode_ute1                      count_ute1 freq_ute1
   <chr>                                  <dbl>     <dbl>
 1 CTGACAGTTACCGTTACAGCAGCCACGCTTCTG       1589    0.596 
 2 CTGACGGTAACAGTGACAGCAGCAACCCTACTT        252    0.0946
 3 CTCACCGTAACAGTAACGGCGGCGACCCTGCTG        145    0.0544
 4 CTAACTGTAACCGTCACTGCGGCAACTCTGCTC        137    0.0514
 5 CTCACGGTGACCGTAACCGCAGCGACTCTGCTC        136    0.0511
 6 CTGACAGTTACTGTCACTGCTGCTACCCTCCTA         98    0.0368
 7 CTTACTGTCACCGTTACCGCAGCAACACTACTT         95    0.0357
 8 CTTACGGTCACGGTAACTGCAGCTACCCTTCTA         51    0.0191
 9 CTTACAGTTACCGTTACAGCAGCCACACTCCTG         50    0.0188
10 CTGACTGTTACGGTTACAGCAGCAACCCTACTC         44    0.0165 

But, when mapped over list1 -

> list2 <- map(list1, barcode_frequency)
# A tibble: 33 × 3
   `barcode_.x[[i]]`                 `count_.x[[i]]` `freq_.x[[i]]`
   <chr>                                       <dbl>          <dbl>
 1 CTGACAGTTACCGTTACAGCAGCCACGCTTCTG            1589         0.596 
 2 CTGACGGTAACAGTGACAGCAGCAACCCTACTT             252         0.0946
 3 CTCACCGTAACAGTAACGGCGGCGACCCTGCTG             145         0.0544
 4 CTAACTGTAACCGTCACTGCGGCAACTCTGCTC             137         0.0514
 5 CTCACGGTGACCGTAACCGCAGCGACTCTGCTC             136         0.0511
 6 CTGACAGTTACTGTCACTGCTGCTACCCTCCTA              98         0.0368
 7 CTTACTGTCACCGTTACCGCAGCAACACTACTT              95         0.0357
 8 CTTACGGTCACGGTAACTGCAGCTACCCTTCTA              51         0.0191
 9 CTTACAGTTACCGTTACAGCAGCCACACTCCTG              50         0.0188
10 CTGACTGTTACGGTTACAGCAGCAACCCTACTC              44         0.0165
  • deparse() and paste() don't function as I would expect. Does anyone know why this is? Is there a specific syntax I need to use to get deparse() to recognize the name of the list element?
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
fitz_meyer
  • 13
  • 2
  • It appears that `map()` doesn't pass down the argument by name, but as if it were in a loop as `x[[i]]`. If you do `debugonce(barcode_frequency)` and then run the `map()` function, you'll see that `df_name` gets the value `x[[i]]`. – DaveArmstrong Aug 28 '23 at 23:52
  • See https://stackoverflow.com/questions/9950144/access-lapply-index-names-inside-fun for how to go about this. – Ritchie Sacramento Aug 28 '23 at 23:54

1 Answers1

0

The comments answer why deparse(substitute(x)) doesn't work the way you expect. You could solve the problem by using map2() and modifying your function a bit to take two arguments - the data frame and the name appended to each column:

library(dplyr)
library(purrr)
ute1 <- data.frame(
  barcode = c("A", "B"), 
  count = c(2,3)
)
dec1 <- data.frame(
  barcode = c("C", "D"), 
  count = c(4,5)
)

list1 <- list(ute1, dec1)
names <- c("ute1", "dec1")
names(list1) <- names

barcode_frequency2 <- function(x, name) {
  totalumis <- sum(x[["count"]])
  x[["freq"]] <- ((x[["count"]])/totalumis)
  names(x) <- paste(names(x), name, sep = "_")
  print(x)
}

list2 <- map2(list1, names(list1), barcode_frequency2)
#>   barcode_ute1 count_ute1 freq_ute1
#> 1            A          2       0.4
#> 2            B          3       0.6
#>   barcode_dec1 count_dec1 freq_dec1
#> 1            C          4 0.4444444
#> 2            D          5 0.5555556

Created on 2023-08-28 with reprex v2.0.2

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • In this regard, `map` is acting exactly as would `lapply`. `lapply` does however use names when labeling the results. – IRTFM Aug 29 '23 at 00:12
  • This seems to work, thank you very much! I missed that in the comments of the previous thread. – fitz_meyer Aug 29 '23 at 19:01