0

I have a data frame with following variables. The real data is values of whole month.

tripData.starttime         tripData.gender
1  1/1/2016 00:00:41               1
2  1/1/2016 00:00:45               1
3  1/1/2016 00:00:48               2
4  1/1/2016 00:01:06               2
5  1/1/2016 00:01:12               1
6  1/1/2016 00:01:19               1

I am trying to group by date. For that I have done substr of date :

temp$starttime <- substr(temp$starttime,1,9)

With the above function I can only get Ymd from date

I want to count number of males and females on each day with below function I am getting count of all the males

(nrow(data[data$gender == "1", ]))

The output should be :

Date         Male          Female
1/1/2016     238            987
1/2/2016     554            210
1/3/2016     443            334

Also tried :

agg.count <- aggregate(day(DT$starttime) ~ DT$gender*DT$starttime, DT, FUN="length")

The output is :

     DT$gender DT$starttime day(DT$starttime)
1         0   2016-01-01              2524
2         1   2016-01-01              6322
3         2   2016-01-01              2163
4         0   2016-01-02              2497
5         1   2016-01-02              8968
6         2   2016-01-02              3122
Raj Parekh
  • 94
  • 1
  • 12
  • 1
    Possible duplicate of [Count number of rows within each group](https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group) – BLT Nov 16 '17 at 23:47
  • 2
    Possible duplicate of [Aggregate by week in R](https://stackoverflow.com/questions/4309248/aggregate-by-week-in-r) – r2evans Nov 16 '17 at 23:50
  • @BLT not a duplicate. I have to group by date and add 2 columns. – Raj Parekh Nov 16 '17 at 23:50
  • 1
    `library(tidyverse);library(lubridate); data %>% mutate(tripData.gender = ifelse(tripData.gender==1, "Male", "Female")) %>% group_by(date=ymd(mdy_hms(tripData.starttime)), tripData.gender) %>% tally %>% spread(tripData.gender, n)` – eipi10 Nov 16 '17 at 23:51
  • @eipi10 I tried to install tidyverse twice. Unable to install. Warning in install.packages : installation of package ‘tidyverse’ had non-zero exit status Any other way? – Raj Parekh Nov 17 '17 at 00:23
  • I'm not sure why it's not installing. Try installing `tidyr` and `dplyr` individually. Those are the two `tidyverse` packages that my code uses. – eipi10 Nov 17 '17 at 00:45
  • @eipi10 tidyr and dplyr are already installed. Can you check my latest edit to answer and suggest some change. – Raj Parekh Nov 17 '17 at 00:52
  • I have only rudimentary knowledge of `data.table`, but first you'll need to convert your `tripData.starttime` column to Date format (right now it's probably either character or factor). It will be much easier to help you if you provide a reproducible example. At the very least, you should paste into your question the output of `dput(data[1:10, ])`. – eipi10 Nov 17 '17 at 00:57
  • @eipi10 I am working on citibike data set. The date is been converted to POSIXct. I have made some other edit. can you please check. – Raj Parekh Nov 17 '17 at 01:07

0 Answers0