3

I have a dataframe df. It looks like:

xSample    a b  c
x          2 0  2
x1         3 0  0
x2         4 0  2 

I have this piece of code: new_df <- as.data.frame(sapply(df[,-1], function(x) sum(as.numeric(x) > 0)))

I want to go through each column of df and count the number of samples and put that into new_df, but only if there are > 0 counts per sample A, B, or C... The new_df should look like this:

       NonZeroCounts
a         3 
c         2  

The b row is not kept because it has 0 counts in every row.

After running my function mentioned above on my df, the output is:

xSample     NonZeroCounts
a          3 
b          0 
c          2
Jennifer
  • 69
  • 5
  • 2
    You can just remove that row where `NonZeroCounts == 0`. [Filter data.frame rows by a logical condition](https://stackoverflow.com/questions/1686569/filter-data-frame-rows-by-a-logical-condition) should help you with that. – Ronak Shah Jan 09 '18 at 01:14
  • ...or you could contain the sum statement in an ifelse statement and return NULL if the sum is equal to Zero....it would remove that row. – sconfluentus Jan 09 '18 at 01:17
  • `data.frame(NonZeroCounts = sapply(df1[-1], function(x) sum(x>0))[colSums(df1[-1]) > 0])` – d.b Jan 09 '18 at 01:18
  • @d.b This is great, thank you. If you add this as an answer, I can accept. – Jennifer Jan 09 '18 at 01:28

2 Answers2

3

First, use sapply to go through the relevant columns and obtain the number of non-zero values. Then, use colSums to subset only those values that are greater than 0

data.frame(NonZeroCounts = sapply(df1[-1], function(x)
    sum(x>0))[colSums(df1[-1]) > 0])
#  NonZeroCounts
#a             3
#c             2

DATA

df1 = structure(list(xSample = c("x", "x1", "x2"), a = 2:4, b = c(0L, 
0L, 0L), c = c(2L, 0L, 2L)), .Names = c("xSample", "a", "b", 
"c"), class = "data.frame", row.names = c(NA, -3L))
d.b
  • 32,245
  • 6
  • 36
  • 77
  • While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion – Rahul Gupta Jan 09 '18 at 05:27
  • is it possible to do something similar with data that contains NA? – FishyFishies Jun 18 '23 at 21:53
1

Another way of doing this:

## Your data
df <- data.frame(a = c(2, 3, 4), b = c(0, 0, 0), c = c(2, 0, 2))

data.frame(NonZeroCounts=colSums(df!=0)[colSums(df!=0)!=0])

#    NonZeroCounts
#a          3
#c          2
Santosh M.
  • 2,356
  • 1
  • 17
  • 29