0

This is how my data looks like. Call it as 'df'. I am looking to count number of 'id' created for some specific date say 2017-11-04. Equally i would like to count number of 'passed' date and logic for the same date i.e, 2017-11-04. Please note that the date i mentioned (2017-11-04) is used as an example but i would like to aggregate for all the dates mentioned in 'date' column.

date            id      passed       logic
2017-11-04      101     2017-11-06   1
2017-11-04      102     2017-11-06   0
2017-11-04      103     2017-11-08   1
2017-11-05      104     NA           NA

PS-2: I have just started R and stack and not aware of basic syntax/rules, so if this question requires any edits, please put a comment. I shall make necessary changes which are required.

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Nov 07 '17 at 10:28
  • thank you @Sotos. I'll go through the link. – Harshit Sharma Nov 07 '17 at 10:36

2 Answers2

0

You can use the package dplyr to group your dataframe by date using group_by() and then summarise it using summarise()

library(dplyr)

df %>% 
  group_by(date) %>%
  summarise(number_of_ids = length(id),
            number_of_passed_date = length(passed[!is.na(passed)]),
            logic = sum(logic, na.rm = TRUE))

This will return:

# A tibble: 2 x 4
        date number_of_ids number_of_passed_date logic
      <date>         <int>                 <int> <int>
1 2017-11-04             3                     3     2
2 2017-11-05             1                     0     0
clemens
  • 6,653
  • 2
  • 19
  • 31
  • It worked. Thanks... I have never used dplyr before, i guess have to start exploring it right away. – Harshit Sharma Nov 08 '17 at 05:34
  • Okay, earlier output was exactly what is in your answer but all of a sudden when i ran that query again, output is not showing date column and present only 1 row with 3 columns!!. any idea why this is happening?? – Harshit Sharma Nov 08 '17 at 12:10
0

If I didn't misunderstand you, you want to count the different values for each df$date

df <- read.table(text="date,id,passed,logic
2017-11-04,101,2017-11-06,1
2017-11-04,102,2017-11-06,0
2017-11-04,103,2017-11-08,1
2017-11-05,104,NA,NA", sep=",", header=TRUE, stringsAsFactors=FALSE)

aggregate(df, by=list(df$date), FUN=function(x) {sum(!is.na(unique(x)))})

Output:

     Group.1 date id passed logic
1 2017-11-04    1  3      2     2
2 2017-11-05    1  1      0     0
Patricio Moracho
  • 717
  • 11
  • 15