Creating a new column where each element is the count of subsets of two other columns, loop free

Question

I have a data frame that looks like the table below that keeps track of a person visiting a store in a certain month. I want to create a new column, Total_Visits, that is a count of the number of times a certain ID visited the store during a certain month. In the below example, for date 6-13 and ID 23, the Total_Visits would have 3 in any row where date == 6-13, and ID == 23.

Data frame I'm looking for would be

Date    ID    Total_Visits   
6-13    23    3
6-13    34    1
6-13    23    3
6-13    23    3
7-13    23    1

While I assume there is some sort of acast function to ensure that I don't have to loop through this (30,000 rows), I would be OK with a loop if vectorization did not work.

@Masoud `df$Total_Visits <- ave(1:NROW(df), df$Date, df$ID, FUN = length)` (or with `=`) is probably the canonical base R method. — lmo, Jul 25 '17 at 18:30

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

You can use dplyr package:

library(dplyr)
df %>%  group_by(Date, ID) %>% mutate(Total_Visits = n())

# # A tibble: 5 x 3 
# # Groups:   Date, ID [3] 
#     Date    ID Total_Visits 
#   <fctr> <int>        <int> 
# 1   6-13    23            3 
# 2   6-13    34            1 
# 3   6-13    23            3 
# 4   6-13    23            3 
# 5   7-13    23            1

Use data.frame on the output to make it a dataframe.

Update:

Or using data.table package:

library(data.table)
setDT(df)[, Total_Visits:=.N, by=c("Date","ID")]

df

#    Date ID Total_Visits 
# 1: 6-13 23            3 
# 2: 6-13 34            1 
# 3: 6-13 23            3 
# 4: 6-13 23            3 
# 5: 7-13 23            1

Data:

df <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("6-13", 
      "7-13"), class = "factor"), ID = c(23L, 34L, 23L, 23L, 23L)), .Names = c("Date", 
      "ID"), class = "data.frame", row.names = c(NA, -5L))

score 2 · Answer 2 · answered Jul 25 '17 at 18:12

2

I like using data.table for these operations. It's also the fastest solution.

library(data.table)

dat = read.table("clipboard", header = TRUE)
setDT(dat)
> dat[ , .(visits = .N), by = .(Date, ID)]
   Date ID visits
1: 6-13 23      3
2: 6-13 34      1
3: 7-13 23      1

answered Jul 25 '17 at 18:12

Kristofersen

2,736
1
15
31

I was posting data.table solution before seeing your solution. They are not exactly the same. Cheers. +1 – M-- Jul 25 '17 at 18:19
This worked as well, thank you @Masoud and Kristofersen – dward4 Jul 25 '17 at 18:30

Creating a new column where each element is the count of subsets of two other columns, loop free

2 Answers2

Update: