0

This is my transaction data

from_id       to_id      date_trx      week    amount
<fctr>        <fctr>     <date>        <dbl>   <dbl>
6644           6934       2005-01-01    1      700
6753           8456       2005-01-01    1      600
9242           9333       2005-01-01    1      1000
9843           9115       2005-01-01    1      900 
7075           6510       2005-01-02    1      400 
8685           7207       2005-01-02    1      1100   

...            ...        ...           ...    ...

9866           6697       2010-12-31    313    95.8
9866           5992       2010-12-31    313    139.1
9866           5797       2010-12-31    313    72.1
9866           9736       2010-12-31    313    278.9
9868           8644       2010-12-31    313    242.8
9869           8399       2010-12-31    313    372.2

I want to count the number of unique to_ids for each from_ids at each week: That is:

data <- data %>% 
  group_by(week,from_id) %>% 
  mutate(weekly_distinct_accounts=n_distinct(to_id))

But, it seems like computation will never end. What is the efficient way to do this? I also tried other functions mentioned here, but they couldn't be helpful either

zx8754
  • 52,746
  • 12
  • 114
  • 209
Leyla Alkan
  • 355
  • 3
  • 12

1 Answers1

0

In case you want to store the result in data you can use ave.

data$weekly_distinct_accounts <- ave(data$to_id, data$from_id, data$week
  , FUN=function(x) length(unique(x)))

or using duplicated

data$weekly_distinct_accounts <- ave(data$to_id, data$from_id, data$week
  , FUN=function(x) sum(!duplicated(x)))

In case you just need the sum per group you can use aggregate.

aggregate(to_id ~ from_id + week, data, function(x) length(unique(x)))

or

aggregate(to_id ~ from_id + week, data, function(x) sum(!duplicated(x)))

or

aggregate(to_id ~ ., unique(data[c("to_id", "from_id", "week")]), length)
GKi
  • 37,245
  • 2
  • 26
  • 48