1

I would like to be able to create a function that would be able to tally up the number of values in columns L2, L3, and L4 that are greater than 0 as a function of some name.

Name    L1     L2     L3    L4
Carl    1       1     0     2
Carl    0       1     4     1 
Joe     3       0     3     1
Joe     2       2     1     0

For example, someFunction(Carl) = 5 and someFunction(Joe) = 4

I do not want to sum up the values, for example someFunction(Joe) = 7 is incorrect. I hope this makes sense, I am pretty stuck on this. Thanks!

Amanda
  • 75
  • 1
  • 9
  • Will this do it? `sum(df[df$Name == 'Carl', -c(1, 2)] > 0)` – Gopala Jan 19 '17 at 13:50
  • 1
    Try also `tapply(rowSums(df[,3:5]>0),df$Name,sum)`. – nicola Jan 19 '17 at 13:59
  • @Gopala thank you, but i am getting this error in return: only defined on a data frame with all numeric variables – Amanda Jan 19 '17 at 14:00
  • I don't know what the column types are in your data. `str(df)` should tell you. – Gopala Jan 19 '17 at 14:01
  • @nicola +1. That is a good solution for getting result on all names. I was focusing on one since a function(name) was mentioned. – Gopala Jan 19 '17 at 14:02
  • @nicola that works for getting a result on all the names. Do you know how I would get one result for just one name? for example, function(Joe) = 4 – Amanda Jan 19 '17 at 14:03
  • My code above gives for one name ('Carl' in that case). You can easily substitute Joe like this `sum(df[df$Name == 'Joe', -c(1, 2)] > 0)`. Also, easy to wrap in a function. – Gopala Jan 19 '17 at 14:14
  • @Gopala Thank you for your suggestions, however, I am still getting the same error: only defined on a data frame with all numeric variables – Amanda Jan 19 '17 at 14:23
  • @nicoles solution returns a named vector, so you can subset by the name you require –  Jan 19 '17 at 14:24
  • @user127649 thanks, do you know how I go about doing this? if you cant tell, im very very new to R :) – Amanda Jan 19 '17 at 14:27
  • @count below posted an answer with a function wrapper. Take a look at that. The actual piece of code is very similar to what I gave above. – Gopala Jan 19 '17 at 14:28
  • I do, but as it looks like homework, or that you're just waiting for someone else to do all the work, I thing the hint should point you in the right direction - google is your friend –  Jan 19 '17 at 14:31
  • @user127649 nope, not homework. Just a biologist trying to figure out R. Im pretty confused with the syntax...there are so many ways to execute one thing! – Amanda Jan 19 '17 at 14:38
  • See if just sticking `['Carl']` on the end of @nicola's solution works. See [link text](http://stackoverflow.com/questions/20794650/how-to-subset-a-named-vector-in-r) –  Jan 19 '17 at 15:17

3 Answers3

1

Or if you want to have a function:

give_count <- function(dat,name) {
    sum(dat[dat$Name == name,3:ncol(dat)]!=0)
    }
give_count(data,"Joe")
count
  • 1,328
  • 9
  • 16
  • Thank you. But when I use this on my data frame, I am getting an answer back as NA. – Amanda Jan 19 '17 at 14:46
  • Is your `data.frame` named `data`? If not, Change the first Argument in the function according to the Name of your `data.frame`. If you have more than 5 columns in your data and only want three columns evaluated Change the `ncol(dat)` in the function to `5`. – count Jan 19 '17 at 14:53
  • thank you. for some reason i still am getting NA. I actually have 14 total columns, but i only want to evaluate columns 6:13. – Amanda Jan 19 '17 at 15:00
  • Have you checked with `str(your.data.frame)`whether we are talking about integer/numeric values? – count Jan 19 '17 at 15:02
  • columns 1:2 are integer, columns 3:5 are "Factor", and columns 6:13 are integers. Column 14 is logical – Amanda Jan 19 '17 at 15:05
  • Did you alter the function accordingly? `sum(dat[dat$Name == name,6:13]!=0)` – count Jan 19 '17 at 15:06
  • This seemed to work! However, it is giving me a result that is larger than I think it should be... I also have negative numbers in these columns. Does the !=0 part of your code pick up the values that arent equal to 0? I need only the values that are > 0 – Amanda Jan 19 '17 at 15:12
  • Yup, negative values will be counted. Just cahnge the `!=` to `>` and it should work. – count Jan 19 '17 at 15:15
0

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Name', specify the columns of interest, in .SDcols, unlist the Subset of Data.table (.SD), check whether it is greater than 0, and get the sum of the logical vector. This is assigned (:=) to create the 'N' column

library(data.table)
setDT(df1)[, N := sum(unlist(.SD)>0), Name, .SDcols = L2:L4]
df1
#   Name L1 L2 L3 L4 N
#1: Carl  1  1  0  2 5
#2: Carl  0  1  4  1 5
#3:  Joe  3  0  3  1 4
#4:  Joe  2  2  1  0 4

Or another option is

setDT(df1)[,  N := sum(unlist(lapply(.SD, `>`, 0))), Name, .SDcols = L2:L4]

Or we can use rowsum/rowSums combination in base R

rowSums(rowsum(+(df1[3:5]>0), df1$Name))
#   Carl  Joe 
#   5    4 

If we need only to do this for a particular 'Name'

setDT(df1)[Name == "Carl"][, sum(unlist(.SD) > 0), .SDcols = L2:L4]

Update

If we need a summarised output, do not assign (:=)

setDT(df1)[, .(N = sum(unlist(.SD)>0)), Name, .SDcols = L2:L4]
#   Name N
#1: Carl 5
#2:  Joe 4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • is setDT a function in R? Sorry, Im very new! – Amanda Jan 19 '17 at 13:58
  • @Amanda It is a function in `data.table` to convert 'data.frame' to 'data.table'. There is a `setDF` function which does the opposite – akrun Jan 19 '17 at 14:00
  • I like your rowSums suggestion a lot. This gives me the count for all the names, however, do you know how I would get one result for one name? I would like to create a function, such that function(Joe) = 4 – Amanda Jan 19 '17 at 14:09
0

I would encourage usage of the tidyverse style of coding. If you use dplyr and reshape2 packages, the code is elegant and easy to read:

library(dplyr)
library(reshape2)
df1 %>% 
  select(-L1) %>% 
  melt(id=1,na.rm=T) %>% 
  group_by(Name) %>% 
  transmute(flag=value>0) %>% 
  summarize(sum(flag))


# A tibble: 2 × 2
    Name `sum(flag)`
  <fctr>       <int>
1   Carl           5
2    Joe           4
Rahul
  • 2,579
  • 1
  • 13
  • 22