0

I'm learning as I go, and I have created the nested data frame below for some proteomics data.

dat_calc <- structure(list(Replicate = 1:5, data = list(structure(list(`Peptide Modified Sequence` = c("ABC", 
"DEF", "GHI", 
"JKL", "MNO"), peptide_sum_t0 = c(12798511, 
24445998, 430914, 4169733152, 1040954968), peptide_sum_t1 = c(71875113, 
90456209, 1425107, 3864848640, 908559156), peptide_sum_t2 = c(80887897, 
94159050, 1567133, 3063087696, 654452648), peptide_sum_t4 = c(134109987, 
135974685, 2576246, 2991914240, 694374412), peptide_sum_t6 = c(138155397, 
143414778, 2586848, 2196034192, 508895062)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L)), structure(list(
    `Peptide Modified Sequence` = c("ABC", "DEF", "GHI", "JKL", "MNO"), peptide_sum_t0 = c(11405482, 
    21481235, 376354, 3738850032, 923938424), peptide_sum_t1 = c(66144582, 
    88866961, 1430590, 3592766336, 807249328), peptide_sum_t2 = c(69746566, 
    77691183, 1300239, 2771424752, 608173524), peptide_sum_t4 = c(127165597, 
    132770276, 2615954, 2959854784, 689271096), peptide_sum_t6 = c(138884615, 
    156055042, 2950787, 2219060208, 501535220)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L)), structure(list(
    `Peptide Modified Sequence` = c("ABC", "DEF", "GHI", "JKL", "MNO"), peptide_sum_t0 = c(9940092, 
    19013854, 334340, 3856996752, 868267100), peptide_sum_t1 = c(64178814, 
    86212062, 1386698, 3411110848, 805829180), peptide_sum_t2 = c(67558997, 
    75232819, 1232377, 2768347280, 606519264), peptide_sum_t4 = c(114366051, 
    115555603, 2182557, 2922670992, 647771528), peptide_sum_t6 = c(156789588, 
    158925759, 2856535, 2321414288, 522189252)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L)), structure(list(
    `Peptide Modified Sequence` = c("ABC", "DEF", "GHI", "JKL", "MNO"), peptide_sum_t0 = c(9567490, 
    17515164, 320470, 3732104688, 840050720), peptide_sum_t1 = c(59593231, 
    80608967, 1214322, 3261328064, 714164192), peptide_sum_t2 = c(67902028, 
    78524336, 1178027, 2635069312, 637725132), peptide_sum_t4 = c(113122689, 
    112846563, 2291548, 2879263456, 647756704), peptide_sum_t6 = c(148436288, 
    152687330, 2912218, 2370981648, 539368072)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L)), structure(list(
    `Peptide Modified Sequence` = c("ABC", "DEF", "GHI", "JKL", "MNO"), peptide_sum_t0 = c(9080372, 
    16406681, 276146, 3520329840, 758908128), peptide_sum_t1 = c(56032381, 
    81149457, 1220275, 3300964608, 717219612), peptide_sum_t2 = c(69496622, 
    83156379, 1340797, 2722983344, 600339772), peptide_sum_t4 = c(116170303, 
    121481896, 2441887, 2647622272, 631005176), peptide_sum_t6 = c(144268687, 
    147230236, 2742541, 2051729408, 498884202)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L)))), class = "data.frame", row.names = c(NA, 
-5L))

It should result in a data frame structured as shown below:

# A tibble: 5 x 2
# Groups:   Replicate [5]
  Replicate data            
      <int> <list>          
1         1 <tibble [5 x 6]>
2         2 <tibble [5 x 6]>
3         3 <tibble [5 x 6]>
4         4 <tibble [5 x 6]>
5         5 <tibble [5 x 6]>

At the moment, I'm running the following code to divide each row within each nested data tibble by a reference row [5] while ignoring the Peptide Modified Sequence column.

dat_calc[[2]][[1]] <- dat_calc[[2]][[1]] %>% mutate_each(funs(./.[5]), setdiff(names(.), `Peptide Modified Sequence`))
dat_calc[[2]][[2]] <- dat_calc[[2]][[2]] %>% mutate_each(funs(./.[5]), setdiff(names(.), `Peptide Modified Sequence`))
dat_calc[[2]][[3]] <- dat_calc[[2]][[3]] %>% mutate_each(funs(./.[5]), setdiff(names(.), `Peptide Modified Sequence`))
dat_calc[[2]][[4]] <- dat_calc[[2]][[4]] %>% mutate_each(funs(./.[5]), setdiff(names(.), `Peptide Modified Sequence`))
dat_calc[[2]][[5]] <- dat_calc[[2]][[5]] %>% mutate_each(funs(./.[5]), setdiff(names(.), `Peptide Modified Sequence`))

This code works just fine and gives back nested tibbles where each row in the tibble 'ABC', 'DEF', 'GHI', and 'JKL' are divided by 'MNO'. 'ABC', 'DEF', 'GHI', and 'JKL' all give the appropriate ratio, and 'MNO' row in each tibble should be 1. There must be a more elegant way to handle this. Anyone have any suggestions?

Thanks!

Eddjah
  • 3
  • 2
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Have you tried `purrr::map`? – MrFlick Jul 12 '20 at 22:44
  • 1
    The documentation for the `nest()` function has some examples that may help. They use `purrr:map()`, which @MrFlick pointed out. https://tidyr.tidyverse.org/articles/nest.html – Eugene Chong Jul 12 '20 at 22:46
  • With `dplyr >= 1.0.0` you can try `dat_calc %>% rowwise() %>% mutate(data = list())` and inside the `list` goes the function you want to apply to each tibble. Refer to your data column as `data` (don`t use the dot `.`). – TimTeaFan Jul 12 '20 at 22:55
  • Thank you for the suggestions and link to reproducible examples. Check out the updated edits for additional info. – Eddjah Jul 13 '20 at 15:13

1 Answers1

0

As mentioned in comments above, purrr::map is there to help you on this task :)

In your example, that would look something like this:

library(dplyr)
library(purrr)

dat_calc.normalized <- dat_calc %>%
  mutate(
    data = map(.x = data, .f = function(df){
      df %>%
        mutate(
          across(.cols = -`Peptide Modified Sequence`, .fns = ~./.[5])
        )
    })
  )

(Moreover, try moving to mutate(across(...)) instead of mutate_each, which is deprecated now)

alex_jwb90
  • 1,663
  • 1
  • 11
  • 20