0

Let's say I have data in wide format (samples in row and species in columns).

species <- data.frame(
    Sample = 1:10, 
    Lobvar = c(21, 15, 12, 11, 32, 42, 54, 10, 1, 2), 
    Limtru = c(2, 5, 1, 0, 2, 22, 3, 0, 1, 2), 
    Pocele = c(3, 52, 11, 30, 22, 22, 23, 10, 21, 32), 
    Genmes = c(1, 0, 22, 1, 2,32, 2, 0, 1, 2)
)

And I want to automatically change the species names, based on a reference of functional groups that I have for all of the species (so it works even if I have more references than actual species in the dataset), for example:

reference <- data.frame(
    Species_name = c("Lobvar", "Ampmis", "Pocele", "Genmes", "Limtru", "Secgio", "Nasval", "Letgos", "Salnes", "Verbes"), 
    Functional_group = c("Crustose", "Geniculate", "Erect", "CCA", "CCA", "CCA", "Geniculate", "Turf","Turf", "Crustose"),
    stringsAsFactors = FALSE
)

EDIT

Thanks to @Dan Y suggestions, I can now changes the species names to their functional group names:

names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]

However, in my actual data.frame I have more species, and this creates many functional groups with the same name in different columns. I now would like to sum the columns that have the same names. I updated the example to give a results in which there is more than one functional group with the same name.

So i get this:

Sample Crustose CCA Erect CCA Crustose
      1       21   2     3   1        2
      2       15   5    52   0        3
      3       12   1    11  22        4
      4       11   0    30   1        1
      5       32   2    22   2        0
      6       42  22    22  32        0

and the final result I am looking for is this:

Sample Crustose CCA Erect
  1       23      3     3     
  2       18      5    52    
  3       16     22    11       
  4       12      1    30       
  5       32      4    22       
  6       42     54    22 

How do you advise on approaching this? Thanks for your help and the amazing suggestions I already received.

krads
  • 1,350
  • 8
  • 14

1 Answers1

0

Re Q1) We can use match to do the name lookup:

names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]

Re Q2) Then we can mapply the rowSums function after some regular expression work on the colnames:

namevec <- gsub("\\.[[:digit:]]", "", names(df))
mapply(function(x) rowSums(df[which(namevec == x)]), unique(namevec)) 
DanY
  • 5,920
  • 1
  • 13
  • 33
  • That is brilliant! thanks! (also thanks for the edits and corrections to my question) – Fabio Favoretto Sep 07 '18 at 01:39
  • So it works, but what if then i want to sum all the Functional groups columns with the same name? – Fabio Favoretto Sep 07 '18 at 02:23
  • `rowSums(species[grep("same_name", names(species))])` – DanY Sep 07 '18 at 15:36
  • No that does not work for me, it gives a row with all zeros. rowSums(species[grep("same_name", names(species))]) [1] 0 0 0 0 0 0 0 0 0 0 What i would like is to have a final data.frame which have the sum of all the functional groups data. We successfully got to change the names, but i am stuck in summing all columns with the same name. I found some solution by converting to long format and then again to wide, but I was wondering whether there is a quicker way... – Fabio Favoretto Sep 08 '18 at 15:41
  • So in my previous comment, you actually have to put in the colname where I wrote "same_name" (e.g., you write "Crustose" instead of "same_name"). However, this only works for one name at a time. See my edited answer above for how to do sum columns with the same name, and to do so for all the different colnames. – DanY Sep 08 '18 at 23:01
  • That works like a charm! Thanks. A final question: I am reading about gsub() so the "\\.[[:digit:]]" part is a combination of different regular expression syntax... what does it mean? take any character (\\.) in an hexadecimal form (a b c d etc.) ? – Fabio Favoretto Sep 09 '18 at 00:38
  • The `gsub` command is saying to replace "a period followed by a number" with a blank in the character vector of rownames for `df` – DanY Sep 09 '18 at 04:44