I have a serious performance issue with a rather easy operation. In fact, I don't get any result at all, even after several hours of running the code.
My data frame consists of approximately 400k records of 10 variables. The code for the operation is:
a2 <- dat %>%
group_by(X1,X2,X3,X4) %>%
summarise(a = length(unique(ID)))
Where X1-X4 are all factors (1600 - 5600 levels). Could the issue be that my ID variable is also a factor (184573 levels)? If so, how can I fix this? I used similar code for a data frame where ID was a int and that worked fine.
However, with my current dataset changing to int
is not possible and changing to chr
does not seem to make sense. Does anyone have an answer?