I am trying to compute the distinct cases for each pair of individual and day of observation.
My data look like this
idno day av sumtime
1 103799_1 1 1 400
2 103799_1 1 5 130
3 103799_1 1 7 60
4 103799_1 4 1 410
5 103799_1 4 5 50
....
Idno
is the personal identifier, day
is the day of observation, av
and sumtime
are activities recorded.
What I am trying to achieve is this
idno day av sumtime ndist
1 103799_1 1 1 400 2
2 103799_1 1 5 130 2
3 103799_1 1 7 60 2
4 103799_1 4 1 410 2
5 103799_1 4 5 50 2
...
I want to count the number of observations by individuals (idno
). So 2
means that each idno
where observed during two days.
If I simply do this
dt %>% group_by(idno, day) %>% mutate(n())
I get
idno day av sumtime n()
1 103799_1 1 1 400 3
2 103799_1 1 5 130 3
3 103799_1 1 7 60 3
4 103799_1 4 1 410 3
5 103799_1 4 5 50 3
Which does not count correctly the number of distinct cases.
So, the only way I found out I could do that is by doing a very cumbersome manoeuvre like this :
dt %>% select(idno, day) %>% distinct() %>% group_by(idno) %>%
mutate(ndist = n()) %>% merge(dt, .)
Do you have any idea, how I could do this more straightforwardly ? Without merging for example.
Thank you very much.
dt = structure(list(idno = c("103799_1", "103799_1", "103799_1", "103799_1",
"103799_1", "103799_1", "103799_2", "103799_2", "103799_2", "103799_2",
"110594_1", "129380_1", "129380_1", "129380_1", "129380_1", "129380_2",
"129380_2", "129380_2", "129380_2", "129380_2", "129380_2", "140090_1",
"140090_1", "140090_2", "140090_2", "155699_1", "155699_1", "155699_2",
"155699_2", "201314_1"), day = c(1L, 1L, 1L, 4L, 4L, 4L, 1L,
4L, 4L, 4L, 1L, 6L, 6L, 6L, 7L, 6L, 6L, 6L, 7L, 7L, 7L, 4L, 7L,
4L, 7L, 1L, 2L, 1L, 2L, 5L), av = c(1L, 5L, 7L, 1L, 5L, 7L, 7L,
1L, 5L, 7L, 7L, 1L, 5L, 7L, 5L, 1L, 5L, 7L, 1L, 5L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 1L), sumtime = c(400, 130, 60, 410, 50,
40, 90, 470, 90, 20, 150, 270, 30, 90, 10, 490, 40, 60, 510,
40, 20, 20, 60, 110, 110, 70, 40, 150, 10, 270)), class = "data.frame", .Names = c("idno",
"day", "av", "sumtime"), row.names = c(NA, -30L))