subsetting a dataset in R

Question

I have a question filtering a dataset based on sum of counts

My file looks like this:

First column is gene names. I want to calculate from the third column, the sum associated with each gene, for g1 it's 6 for g2 it's 16 and so on. Then the condition is if the sum of each gene is > 10 then filter the above input dataset such that my output looks like

this is what I have tried so far:

tab <- read.data("input.txt",header=FALSE)
genelist <- split(tab,tab[,1])

How can i sum it and filter it out > 10. I think I have to use sapply to loop it through but i am stuck here. Any help is appreciated

@NelsonGon Not exactly. The OP wants to retain the original rows, not just the summed value :-) — Tim Biegeleisen, May 09 '19 at 05:11
with `dplyr` you can do`df %>% group_by(V1) %>% filter(sum(V3) > 10) ` — Ronak Shah, May 09 '19 at 05:12
@RonakShah I can't actually find a duplicate, so maybe post that as an answer. — Tim Biegeleisen, May 09 '19 at 05:13
If you want to keep the `Sum` column: `df %>% group_by(V1) %>% mutate(Sum=sum(V3)) %>% filter(Sum > 10)` — NelsonGon, May 09 '19 at 05:17

score 1 · Accepted Answer · answered May 09 '19 at 05:23

1

Is this what you're looking for?

n_vars <- 40
gene <- sample(x=c("g1","g2","g3","g4"),size=n_vars,replace = TRUE)
v1 <- sample(x=c("a","b","c","d","e","f","g"),size=n_vars,replace = TRUE)
result <- rnorm(n=n_vars,mean=0,sd=10)

df <- data.frame(gene,v1,result) %>% 
  arrange(gene,v1) %>% 
  group_by(gene,v1) %>% 
  summarise(total=sum(result)) %>% 
  filter(total>10)

answered May 09 '19 at 05:23

Zeus

1,496
2
24
53

this works fine, can you explain the code, new to dplyr – user3138373 May 09 '19 at 05:33
sure, which part don't you understand? Take a look at https://www.tidyverse.org/ – Zeus May 09 '19 at 05:48

subsetting a dataset in R

1 Answers1

Linked