0

Take a sample of my datatset to be this Rows HKSJS_F1 SJSKA_F4 AJSIWAL_F1 SJSKSUE_F3 AKSICLS_F4 AKAASLE_F1

Using R I need to: Group the rows by their ending, subgroup - determined by the F*, e.g F1 or F2 Then I need to then count how man instances of each subgroup they are and return this in an CSV as my output. I have printed out my row names using genenames <- row.names(dataset) print(genenames) But not sure where to go from here.

Any help appreciated.

ldorman
  • 33
  • 5
  • 1
    Can you provide a reproducible code, using `dput(head(genenames))`? – Lime Oct 13 '20 at 17:11
  • 1
    Use `tidyr::separate` to create a `subgroup` column, then see the FAQ on [counting groups](https://stackoverflow.com/q/9809166/903061). If you need more help than that, please share a reproducible sample of data, preferably using `dput()` as Lime suggests. – Gregor Thomas Oct 13 '20 at 17:15

2 Answers2

1

Does this answer:

> library(dplyr)
> dat
        col1
1   HKSJS_F1
2   SJSKA_F4
3 AJSIWAL_F1
4 SJSKSUE_F3
5 AKSICLS_F4
6 AKAASLE_F1
> dat %>% group_by(gsub('(.*)_(F.+)','\\2',col1)) %>% summarise(Count = n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  `gsub("(.*)_(F.+)", "\\\\2", col1)` Count
  <chr>                               <int>
1 F1                                      3
2 F3                                      1
3 F4                                      2

Sample data used:

> dput(dat)
structure(list(col1 = c("HKSJS_F1", "SJSKA_F4", "AJSIWAL_F1", 
"SJSKSUE_F3", "AKSICLS_F4", "AKAASLE_F1")), class = "data.frame", row.names = c(NA, 
-6L))
> 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
0

If your names are already in a column you may try this:

library(tidyverse)

# datset
df <- data.frame(text = c("HKSJS_F1", "SJSKA_F4", "AJSIWAL_F1",
                          "SJSKSUE_F3", "AKSICLS_F4", "AKAASLE_F1"))

df2 <- df %>% 
  count(text_ending = str_extract(text, "F[0-9]$")) 


write.csv2(df2, file = "yourpath/csvname.csv", row.names = F)

When your text is stored in the rownames you can try this:

library(tidyverse)

# dataset
df <- data.frame(text = rep(1, 6)) 
row.names(df) <- c("HKSJS_F1", "SJSKA_F4", "AJSIWAL_F1",
                     "SJSKSUE_F3", "AKSICLS_F4", "AKAASLE_F1")

df2 <- df %>% 
  add_rownames("rowtext") %>%
  count(rowname_ending = str_extract(rowtext, "F[0-9]$"))


write.csv2(df2, file = "yourpath/csvname.csv", row.names = F)

In both code chunks you need to adapt your path in write.csv2.

tamtam
  • 3,541
  • 1
  • 7
  • 21