Count with condition

Question

I would like to count how often one variable id_tasks occurs per month. The month variable is from 1 to 12.

So far i have only managed to count how often, each task occurs with the help of: I would like to know how often the task occurs in every month as the output, in order to detect which month has the most/least tasks.

count(df,c('id_task'))

id_task id_user day completion_yesno day_created has_deadline deadline created_before active overdue completed_before month 
16416   37033    5272  61                0          61            1      172              0      0       0                0
16417   37033    5272  62                0          61            1      172              2      2       0                0
16418   37033    5272  63                0          61            1      172              2      2       0                0
16419   37033    5272  64                0          61            1      172              2      2       0                0
16420   37033    5272  65                0          61            1      172              2      2       0                0
16421   37033    5272  66                0          61            1      172              2      2       0                0
16422   37033    5272  67                0          61            1      172              2      2       0                0
16423   37033    5272  68                0          61            1      172              2      2       0                0
16424   37033    5272  69                0          61            1      172              2      2       0                0
16425   37033    5272  70                0          61            1      172              2      2       0                0
16426   37033    5272  71                0          61            1      172              2      2       0                0
16427   37033    5272  72                0          61            1      172              2      2       0                0
16428   37033    5272  73                0          61            1      172              2      2       0                0
16429   37033    5272  74                0          61            1      172              2      2       0                0
16430   37033    5272  75                0          61            1      172              2      2       0                0
16431   37033    5272  76                0          61            1      172              2      2       0                0
16432   37033    5272  77                0          61            1      172              2      2       0                0
16433   37033    5272  78                0          61            1      172              2      2       0                0
16434   37033    5272  79                0          61            1      172              2      2       0                0
16435   37033    5272  80                0          61            1      172              2      2       0                0

desired output:

id_task  month freq
1         12    3
2          1    20

Please provide sample data so that we can understand the scope. This can be done with `dput(head(df))` or building new data programmatically with `data.frame(...)`. Also, what is your desired output *given that sample data*? This might mean you have to manually count it once to get the point across. Lastly, what have you tried? Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, May 26 '20 at 19:51
@ i have added the data and desired output, somehow the table was destroyed.., i dont know how to fix it. — Amy, May 26 '20 at 19:58
I don't know what you mean by "destroyed": I suggested an edit to the question that put the data in code blocks (much easier to read). Do you mean that the tabular fixed-width format of unformatted data was hard to see? — r2evans, May 26 '20 at 19:59
@re2evans just noted that, looks way better now, i also have a month variable that is month c(1,2,3,6,12) — Amy, May 26 '20 at 20:00
Where does `'task'` come in to play? I see a column `"id_task"`, but tasks 1 and 2 do not appear in your raw data. (It's really helpful when your intended output corresponds to the actual data in the sample. Also, where does `month` come in? I don't see it anywhere in the data. — r2evans, May 26 '20 at 20:00
i tried to call them month and task for ease, but task is id_task and month is in my df an just a simple variable that has numbers from 1 to 12 indicating the months, i have edited this now — Amy, May 26 '20 at 20:04
Do you mean `dat %>% group_by(id_task, month) %>% count(name = "freq")`? (using `dplyr`). If not, please update either your sample data or your desired output so that they match. Otherwise it's much of a guess. — r2evans, May 26 '20 at 20:09
Final point, Amy, it really helps when your code and sample are *consistent*. In addition to the data (previous comment), there is also `'Id_task'` versus `id_task`. It can be difficult to get through copy/paste typos to see the real problem when we aren't certain that it isn't a real typo in the actual code. — r2evans, May 26 '20 at 20:11

Jan · Accepted Answer · 2020-05-26T20:40:40.713

3

if you want to count the occuriencies of all the month X task combinations, tableis your function:

table(df[, c("month", "id_task")])

You can rerun this with this dummy data:

df <- data.frame(id_task= sample.int(15, 100, replace = TRUE), month = rep(1:12, length.out=100))
table(df[, c("month", "id_task")])

If you want the sum of tasks per month just drop the task column and run it like this:

table(df[, c("month")])

edited May 26 '20 at 20:40

answered May 26 '20 at 20:11

Jan

4,974
3
26
43

If you wrap the table in `as.data.frame(...)`, you can get the three-column output in the desired output. – r2evans May 26 '20 at 20:19
Jan @r2evans the combination makes it wiork! Now i have every month for every task, is there a way of aggregating the overall tasks to every month? – Amy May 26 '20 at 20:23
@Jan i have created dummy variables for every now, would it be possible to state how often tasks occur per month? – Amy May 26 '20 at 20:33

score 2 · Answer 2 · answered May 26 '20 at 20:01

2

You can ask with the dplyr package the following:

data %>%
> group_by(month) %>%
> count(id_task)

I think this will do. (:

answered May 26 '20 at 20:01

Gabriel Reis

74
6

1

@GabrielReis, consider adding `id_task` to your grouping, it might be more inline with what is being asked (though the OP is not clear on this). – r2evans May 26 '20 at 20:10

Count with condition

2 Answers2