0

Really simple question, but somehow i am stuck. I have panel data of users daily tasks. Now i want to find out how many tasks one user does on average, but somehow i have no idea how. And how long one user on average takes per task. Also, i would like to plot this data if possible. I did the normal descriptives, but i feel like it is not exactly what i need. The data looks somewhat like this user (1, 1, 1, 2, 2,3) task( 1, 1,2, 3,4, 5) day( 1, 2, 1,1,2,1) task creation (1,1,1,4,4,3) deadline(5,5,5,9,9,4)

      id_task id_user day completion_yesno day_created has_deadline deadline created_before active overdue completed_before
16416   37033    5272  61                0          61            1      172              0      0       0                0
16417   37033    5272  62                0          61            1      172              2      2       0                0
16418   37033    5272  63                0          61            1      172              2      2       0                0
16419   37033    5272  64                0          61            1      172              2      2       0                0
16420   37033    5272  65                0          61            1      172              2      2       0                0
16421   37033    5272  66                0          61            1      172              2      2       0                0
16422   37033    5272  67                0          61            1      172              2      2       0                0
16423   37033    5272  68                0          61            1      172              2      2       0                0
16424   37033    5272  69                0          61            1      172              2      2       0                0
16425   37033    5272  70                0          61            1      172              2      2       0                0
16426   37033    5272  71                0          61            1      172              2      2       0                0
16427   37033    5272  72                0          61            1      172              2      2       0                0
16428   37033    5272  73                0          61            1      172              2      2       0                0
16429   37033    5272  74                0          61            1      172              2      2       0                0
16430   37033    5272  75                0          61            1      172              2      2       0                0
16431   37033    5272  76                0          61            1      172              2      2       0                0
16432   37033    5272  77                0          61            1      172              2      2       0                0
16433   37033    5272  78                0          61            1      172              2      2       0                0
16434   37033    5272  79                0          61            1      172              2      2       0                0
16435   37033    5272  80                0          61            1      172              2      2       0                0

In this case one user would work on 2 tasks on average, but i just found it out through counting.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
Amy
  • 91
  • 1
  • 12
  • Is this a data.frame. Can you show the expected – akrun May 25 '20 at 21:39
  • @akrun yes it is a data frame. what do you mean with expected? – Amy May 25 '20 at 21:41
  • I meant the expected output – akrun May 25 '20 at 21:41
  • 1
    Is this a data.frame. Can you please use `dput` so that the structure is clear – akrun May 25 '20 at 21:48
  • 2
    Just use the `head(yourdata, 20)` and the expected output based on that – akrun May 25 '20 at 21:58
  • This answer explains [how to make a great reproducible example](https://stackoverflow.com/a/5965451/2641825). In your case, I would recommend you to build the small data frame given at the end of your first paragraph with the variability that you expect on the variables `completed_yesno` and `completed_before` those variables only have zero values in your example and it would be more meaningful to have some variability there. – Paul Rougieux May 27 '20 at 06:32

1 Answers1

2

Keep only information on user, task and completed. Remove duplicated lines, then group by user and compute the number of completed tasks for each user:

df_by_user <- df %>% 
    select(id_user, id_task, completion_yesno) %>% 
    unique() %>% 
    group_by(id_user) %>%
    summarise(n = sum(completion_yesno))

Then compute the average:

 mean(df_by_user$n)
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
  • could you explain what exactly the output would mean? it is way to high to be the average task per user, because every task occurs multiple times. However, i am only intrested in how often one unique task occurs – Amy May 25 '20 at 23:12
  • I see, then you may want to first group_by(id_user, completion_yesno), then compute the sum of that. – Paul Rougieux May 26 '20 at 00:01