0

I've been struggling on R for the last two days with a simple basic operation that would take 1 min to perform in Excel. Let me explain: I have a dataframe with 65 columns (variables, the species found) and 75 rows (obs, the quadrats). The first 25 rows are quadrats of site A, then the next 25 from site B then the last 25 from site C. Each column is a species and if the species X is present in Site A1, we have a 1 and if it's not there is a zero. I would like to study the diversity index of each site and not of each quadrat which is what specnumber() and diversity() will give me. Here is an exemple of the dataframe with 5 quadrats from site A and 3 species:

Site Quadrat Sp X. Sp Y. Sp Z.
A Quadrat A1 1 1 0
A Quadrat A2 0 1 0
A Quadrat A3 1 1 0
A Quadrat A4 0 0 1
A Quadrat A5 1 1 1

And here is what I would like to get:

Site Sp X. Sp Y. Sp Z.
Site A 3 4 2

I have tried sumRows and sum of course, as well as rbind and dozens of other things found online but nothing seems to work, I am really new to R:

new <- rbind(df, L = sum(df[1:25,]) )

new <- rowSums(df[1:25,]

Thank you so much

3 Answers3

3

Some example data.

df <- structure(list(Sites = c("Quadrat A1", "Quadrat A2", "Quadrat A3", 
"Quadrat A4", "Quadrat A5"), `Sp X.` = c(0L, 1L, 1L, 0L, 1L), 
    `Sp Y.` = c(0L, 1L, 1L, 0L, 1L), `Sp Z.` = c(0L, 0L, 0L, 
    1L, 1L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L))

Tidyverse solution. It should work with any number of sites.

library("dplyr")
library("stringr")

df |>
  mutate(Sites=str_glue("Site {x}", x=str_extract(Sites, "(?<=\\s)[[:alpha:]]+"))) |> 
  group_by(Sites) |>
  summarize(across(where(is.numeric), sum), .groups="drop")

Result.

# A tibble: 1 × 4
  Sites  `Sp X.` `Sp Y.` `Sp Z.`
  <glue>   <int>   <int>   <int>
1 Site A       3       3       2
rpolicastro
  • 1,265
  • 8
  • 14
  • 1
    +1 I was about to post a similar answer. [This answer](https://stackoverflow.com/questions/2341184/what-does-x-mean-in-regex) may be useful to decipher the regex. I would replace `str_extract` with `str_match(Sites, "^.* ([^0-9]+)[0-9]*$")[, 2]`. –  Oct 11 '22 at 12:12
  • Hi thanks for your answer, is there anyway to do this without using tidyverse? I am getting many errors although I do have the packages installed and on. `df |> Error: unexpected '>' in "df |>" > mutate(Sites=str_glue("Site {x}", x=str_extract(Sites, "(?<=\\s)[[:alpha:]]+"))) |> Error: unexpected '>' in " mutate(Sites=str_glue("Site {x}", x=str_extract(Sites, "(?<=\\s)[[:alpha:]]+"))) |>" > group_by(Sites) |> Error: unexpected '>' in " group_by(Sites) |>" > summarize(across(where(is.numeric), sum), .groups="drop").` – Edouard Duquesne Oct 11 '22 at 12:30
  • The native R pipe `|>` was added in R version `4.1.0` (I believe). If your R version is lower switch them to the magrittr pipe `%>%`. – rpolicastro Oct 11 '22 at 12:39
  • Indeed you were right, thank you very much. – Edouard Duquesne Oct 11 '22 at 12:46
0

You might reformat your data by adding a column for Site. You can write code to pull out the "A" and "B" from Quadrant, but that is making it harder than it needs to be.

df <- data.frame("Site" = c("A", "A", "A", "B", "B", "C"), "SpecA" = c(1,0,1,1,1,0), "SpecB" = c(0,1,0,1,1,1))

Then do something like this:

siteA <- sum(df$SpecA[df$Site == "A"])
Eric Krantz
  • 1,854
  • 15
  • 25
  • Thanks for your answer but I believe your code only does the sum for one column (Species A) and I have 65 columns so that would be not efficient at all... – Edouard Duquesne Oct 11 '22 at 12:15
0

If You know how many rows each squares occupy, you can use colSums for these rows and make a data frame out of it.

For exapmle data:

       Sites Sp_x Sp_y Sp_z
1 Quadrat A1    1    1    0
2 Quadrat A2    0    1    0
3 Quadrat A3    1    1    0
4 Quadrat B1    0    0    1
5 Quadrat B2    1    1    1
6 Quadrat B3    1    1    0
7 Quadrat C1    1    1    0
8 Quadrat C2    0    0    1
9 Quadrat C3    0    0    0
Quadrat_A <- colSums(data[1:3,-1])
Quadrat_B <- colSums(data[4:6,-1])
Quadrat_C <- colSums(data[7:9,-1])

data_merged <- t(data.frame(Quadrat_A,Quadrat_B,Quadrat_C, stringsAsFactors=FALSE))

Output:

          Sp_x Sp_y Sp_z
Quadrat_A    2    3    0
Quadrat_B    2    2    2
Quadrat_C    1    1    1
KacZdr
  • 1,267
  • 3
  • 8
  • 23
  • Hi, thank you for your answer. Do you know if there is a way to do this without having to write 65 times the code? I have 65 species so that would require quite a lot of time... – Edouard Duquesne Oct 11 '22 at 12:18
  • Updated, You can use colSums(). Now all you have to do is set the row range for specific quadrats. – KacZdr Oct 11 '22 at 12:41