1

I have a very large dataset showing logins to a website. I'm trying to calculate the frequency of logins by username.

What I hope to get is a table like the below where the period is listed as column header and the frequency is the row name and the data is the number of users who logged in on the given day for the number of times in the header row, i.e.

[weekday] [Mon][tue][etc]

[logins ] 

[      1][123][456][789]

[      2][987][654][321]

[    etc][123][456][789]

The source data is simply login id, date/time login. I've been able to add columns appending the month, name of the day, and day number based on the date of login.

Ideally I'd like to be able to get the same sort of summary as above for each category (Month, day of month, day name).

library(lubridate ) 
library(dplyr) 
library(rpivotTable) 
df = data.frame(datasource) 

df$MonthNumber <- month(df$Date) 
df$DayNumber <- wday(df$Date, FALSE, FALSE) 
df$DayName <- wday(df$Date, TRUE, FALSE) 

#problem is here, i dont know how to get the count of user logins per day 
Results <- xtabs(~ DayCount + c(DayName,USERID), df) 
write.csv(Results, file="weekdata.csv") 
Results 
ekad
  • 14,436
  • 26
  • 44
  • 46
tsuimark
  • 53
  • 4
  • You might try using the `group_by` function in `dplyr`. Since you want the number of logins per day for each user, we group by the user id and the day. `results <- df %>% group_by(USERID, Date) %>% summarise(Logins = n())`. The `n()` function just counts the number of rows in a group. Also, note that we don't need to create other date variables since we can just group on the original date value. – tblznbits Jan 19 '16 at 16:54
  • A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be very helpful here. – alexwhitworth Jan 19 '16 at 19:51
  • thanks for the guidance, your solution tells me how many time a user logged in on a particular date. what I want to know is more like "what is the busiest day of the week?" so I need the number of times a given user logs in on a monday for example. I feel like if it were more like `results <- df %>% group_by(DayName, Logins = n()) %>% summarise(Logins = n())` I'd get something like what I want. – tsuimark Jan 20 '16 at 11:55
  • i've found that ' results <- table(USER_ID,Date) ' gives something along the lines of what I'm looking for, in that it gives the number of times a user has logged in on a given day of the week. to flip that into frequency by day, I've exported into excel and user countif to get what I'm looking for. I think there's probably a better way within R – tsuimark Jan 20 '16 at 16:52

0 Answers0