2

I have a data frame that looks like the table below that keeps track of a person visiting a store in a certain month. I want to create a new column, Total_Visits, that is a count of the number of times a certain ID visited the store during a certain month. In the below example, for date 6-13 and ID 23, the Total_Visits would have 3 in any row where date == 6-13, and ID == 23.

Date    ID   
6-13    23   
6-13    34   
6-13    23   
6-13    23   
7-13    23   

Data frame I'm looking for would be

Date    ID    Total_Visits   
6-13    23    3
6-13    34    1
6-13    23    3
6-13    23    3
7-13    23    1

While I assume there is some sort of acast function to ensure that I don't have to loop through this (30,000 rows), I would be OK with a loop if vectorization did not work.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
dward4
  • 1,615
  • 3
  • 15
  • 30
  • 1
    `ave(1:NROW(dat), dat$Date, dat$ID, FUN = length)` – d.b Jul 25 '17 at 18:07
  • 2
    @Masoud `df$Total_Visits <- ave(1:NROW(df), df$Date, df$ID, FUN = length)` (or with `=`) is probably the canonical base R method. – lmo Jul 25 '17 at 18:30

2 Answers2

3

You can use dplyr package:

library(dplyr)
df %>%  group_by(Date, ID) %>% mutate(Total_Visits = n())

# # A tibble: 5 x 3 
# # Groups:   Date, ID [3] 
#     Date    ID Total_Visits 
#   <fctr> <int>        <int> 
# 1   6-13    23            3 
# 2   6-13    34            1 
# 3   6-13    23            3 
# 4   6-13    23            3 
# 5   7-13    23            1

Use data.frame on the output to make it a dataframe.

Update:

Or using data.table package:

library(data.table)
setDT(df)[, Total_Visits:=.N, by=c("Date","ID")]

df

#    Date ID Total_Visits 
# 1: 6-13 23            3 
# 2: 6-13 34            1 
# 3: 6-13 23            3 
# 4: 6-13 23            3 
# 5: 7-13 23            1

Data:

df <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("6-13", 
      "7-13"), class = "factor"), ID = c(23L, 34L, 23L, 23L, 23L)), .Names = c("Date", 
      "ID"), class = "data.frame", row.names = c(NA, -5L))
Community
  • 1
  • 1
M--
  • 25,431
  • 8
  • 61
  • 93
2

I like using data.table for these operations. It's also the fastest solution.

library(data.table)

dat = read.table("clipboard", header = TRUE)
setDT(dat)
> dat[ , .(visits = .N), by = .(Date, ID)]
   Date ID visits
1: 6-13 23      3
2: 6-13 34      1
3: 7-13 23      1
Kristofersen
  • 2,736
  • 1
  • 15
  • 31