1

I am trying to compute the distinct cases for each pair of individual and day of observation.

My data look like this

       idno day av sumtime
1  103799_1   1  1     400
2  103799_1   1  5     130
3  103799_1   1  7      60
4  103799_1   4  1     410
5  103799_1   4  5      50
....

Idno is the personal identifier, day is the day of observation, av and sumtime are activities recorded.

What I am trying to achieve is this

      idno day av sumtime ndist
1 103799_1   1  1     400     2
2 103799_1   1  5     130     2
3 103799_1   1  7      60     2
4 103799_1   4  1     410     2
5 103799_1   4  5      50     2
...

I want to count the number of observations by individuals (idno). So 2 means that each idno where observed during two days.

If I simply do this

dt %>% group_by(idno, day) %>% mutate(n())  

I get

        idno day av sumtime n()
1  103799_1   1  1     400   3
2  103799_1   1  5     130   3
3  103799_1   1  7      60   3
4  103799_1   4  1     410   3
5  103799_1   4  5      50   3

Which does not count correctly the number of distinct cases.

So, the only way I found out I could do that is by doing a very cumbersome manoeuvre like this :

dt %>% select(idno, day) %>% distinct() %>% group_by(idno) %>% 
mutate(ndist = n()) %>% merge(dt, .) 

Do you have any idea, how I could do this more straightforwardly ? Without merging for example.

Thank you very much.

dt = structure(list(idno = c("103799_1", "103799_1", "103799_1", "103799_1", 
"103799_1", "103799_1", "103799_2", "103799_2", "103799_2", "103799_2", 
"110594_1", "129380_1", "129380_1", "129380_1", "129380_1", "129380_2", 
"129380_2", "129380_2", "129380_2", "129380_2", "129380_2", "140090_1", 
"140090_1", "140090_2", "140090_2", "155699_1", "155699_1", "155699_2", 
"155699_2", "201314_1"), day = c(1L, 1L, 1L, 4L, 4L, 4L, 1L, 
4L, 4L, 4L, 1L, 6L, 6L, 6L, 7L, 6L, 6L, 6L, 7L, 7L, 7L, 4L, 7L, 
4L, 7L, 1L, 2L, 1L, 2L, 5L), av = c(1L, 5L, 7L, 1L, 5L, 7L, 7L, 
1L, 5L, 7L, 7L, 1L, 5L, 7L, 5L, 1L, 5L, 7L, 1L, 5L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 1L), sumtime = c(400, 130, 60, 410, 50, 
40, 90, 470, 90, 20, 150, 270, 30, 90, 10, 490, 40, 60, 510, 
40, 20, 20, 60, 110, 110, 70, 40, 150, 10, 270)), class = "data.frame", .Names = c("idno", 
"day", "av", "sumtime"), row.names = c(NA, -30L))
giac
  • 4,261
  • 5
  • 30
  • 59

0 Answers0