0

Due to some coding, the dataset is overly ordered. I would like to keep the columns, but would like the sum up every Action (= 21 unique terms). In order that it fits in a single row that counts the occurrences in each quartile (Q) of the procedure.

The goal is to tally predictors per instance of Q. The data is sorted in 21 Actions, and 4 Qs and 100 procedures. Overall tally is present, we want to cut in 4 pieces (time).

I also don't mind tallying all the Actions with removing Q as a column.

I want to have this, or this is what I expect from the result:

   procedure`action 1 Q1` `action 1 Q2` `action 2 Q1` `action 2 Q2`
    <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
 1   1             4             2             2             3
 2   2             2             2             1             5
 3   3             ..            ..            ..            ..

But my dataframe looks like this:

   Q     procedure `action 1 Q1` `action 1 Q2` `action 2 Q1` `action 2 Q2`
   <fct>    <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
 1 Q1           1             4             0             0             0
 2 Q1           1             0             0             2             0
 3 Q1           1             0             0             0             0
 4 Q1           1             0             0             0             0
 5 Q2           1             0             2             0             0
 6 Q2           1             0             0             0             3
 7 Q2           1             0             0             0             0
 8 Q2           1             0             0             0             0
 9 Q1           2             2             0             0             0
10 Q1           2             0             0             1             0
11 Q1           2             0             0             0             0
12 Q1           2             0             0             0             0
13 Q2           2             0             2             0             0
14 Q2           2             0             0             0             5
15 Q2           2             0             0             0             0
16 Q2           2             0             0             0             0
# ... with 4 more variables: `action 3 Q1` <dbl>, `action 3 Q2` <dbl>, `action 4
#   Q1` <dbl>, `action 4 Q2` <dbl>

I tried this, but got stuck when using spread() and gave the second dataframe example as output.

procedure <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(procedure, Q, action)

library(dplyr)
library(tidyr)

# We can group by procedure, Q and action, and then count the instance with tally().

df_long <- df %>% group_by(procedure, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)

# Now we can use the function spread to create wide dataframe with columns for each combination of Q and action:

df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))

df_long looks like this (after paste(action.Q)):

# A tibble: 10 x 5
# Groups:   procedure, Q [3]
   procedure Q     action       n action.Q   
      <dbl> <fct> <fct>    <int> <chr>      
 1        1 Q1    action 1     4 action 1 Q1
 2        1 Q1    action 2     2 action 2 Q1
 3        1 Q1    action 3     2 action 3 Q1
 4        1 Q1    action 4     2 action 4 Q1
 5        1 Q2    action 1     2 action 1 Q2
 6        1 Q2    action 2     4 action 2 Q2
 7        1 Q2    action 3     2 action 3 Q2
 8        1 Q2    action 4     2 action 4 Q2
 9        2 Q1    action 1     2 action 1 Q1
10        2 Q1    action 2     2 action 2 Q1

Source: Counting text values across different columns, in to new columns

fleems
  • 109
  • 1
  • 1
  • 10

1 Answers1

1

I'm sure there are better ways, but starting from where you left of:

df_wide <- df_long %>% spread(action.Q, n, fill=0)
df_wide %>%
  group_by(procedure) %>%
  summarize(`action 1 Q1` = sum(`action 1 Q1`), `action 1 Q2` = sum(`action 1 Q2`), `action 2 Q1` = sum(`action 2 Q1`), `action 2 Q2` = sum(`action 2 Q2`))

Note that I changed operatie into procedure in your sample data set.

EDIT: Thanks to Ronak Shah, you can do the summarize in a less manual manner:

df_wide %>%
  group_by(procedure) %>%
  summarize_at(vars(starts_with("action ")), sum)

Note the 'space' after action, to avoid matching the action column itself.

Sven
  • 1,203
  • 1
  • 5
  • 14
  • Hi thanks for your reply. I have 21 actions x 4 Qs, is there a less manual way of writing this up? I'll fiddle a bit with `action.Q` in summarize() – fleems Jun 27 '19 at 11:25
  • 1
    use `summarise_at`, `df_wide %>% group_by(procedure) %>% summarise_at(vars(starts_with("action")), sum)` – Ronak Shah Jun 27 '19 at 11:31
  • More than one way to do it in R: df_wide %>% group_by(procedure) %>% summarise_each(funs(sum), starts_with("action ") – smandape Jun 27 '19 at 18:55
  • Action is the example to work with, thus "action" in reality is a list of 21 different actions (delegating, instructing, didactic et cetera). I could have made that more clear, I was looking for a general solution. But I guess I can subset and merge it with the main data file. – fleems Jun 28 '19 at 13:24