How do you write R code that loops through and manipulates rows with identical values in one column (e.g. Names)?

Question

As an example, this table snippet:

##   AmAcid Codon Number PerThous
## 1    Gly   GGG  25874    19.25
## 2    Gly   GGA  13306     9.90
## 3    Ser   UAC  25320    18.84
## 4    Ser   UAU  68310    50.82
## 5    Val   GUC  25874    19.25
## 6    Val   GUA  13306     9.90
## 7    Gly   GGT  25320    18.84
## 8    Gly   GGC  68310    50.82
...

I want to write a function/loop that identifies all AmAcid == Gly, then manipulate their respective values in Number and/or PerThous columns, such as finding the max, min, sum, etc. And repeats for every other unique string in AmAcid, not just Gly.

I have this very rough pseudocode, but I think I'm waaaay off base on R's syntax.

for (i in AmAcid_tabl$AmAcid) {
  deviation$i <- (max(AmAcid_tabl$Number)-min(AmAcid_tabl$Number))/mean(AmAcid_tabl$Number)
}

How can I implement this properly?

Look at the `tidyverse` and especially the `dplyr` package. You can use `group_by` and `mutate` to do exactly that. Even without any for loop. — symbolrush, May 13 '20 at 05:39

score 1 · Answer 1 · answered May 13 '20 at 05:38

Using dplyr:

library(tidyverse)

dat <- tribble(
  ~AmAcid, ~Codon, ~Number, ~PerThous,
  "Gly",   "GGG",  25874,    19.25,
  "Gly",   "GGA",  13306,     9.90,
  "Ser",   "UAC",  25320,    18.84,
  "Ser",   "UAU",  68310,    50.82,
  "Val",   "GUC",  25874,    19.25,
  "Val",   "GUA",  13306,     9.90,
  "Gly",   "GGT",  25320,    18.84,
  "Gly",   "GGC",  68310,    50.82
)

dat %>% 
  group_by(AmAcid) %>% 
  mutate(i = (max(Number) - min(Number)) / mean(Number)) %>% 
  ungroup()

You may want to use summarise() instead of mutate() depending on what yuo're trying to achieve.

score 1 · Accepted Answer · answered May 13 '20 at 05:39

There are functions using which you can perform such grouped operations.

In base R, you can do :

aggregate(Number~AmAcid, df, function(x) (max(x) - min(x))/mean(x))

#  AmAcid    Number
#1    Gly 1.6566222
#2    Ser 0.9182954
#3    Val 0.6415518

Using dplyr :

library(dplyr)
df %>% 
  group_by(AmAcid) %>% 
  summarise(new_col = (max(Number) - min(Number))/mean(Number))

Or data.table

library(data.table)
setDT(df)[, .(new_col = (max(Number) - min(Number))/mean(Number)), AmAcid]

How do you write R code that loops through and manipulates rows with identical values in one column (e.g. Names)?

2 Answers2