0

EDIT 1: Clarity and typo's, expanded more on example.


I have a dataset with one column (Action) that has text values. I want to count the unique values (for a given Operatie) and place these in a new column, according to their ID (== Operatie). There are 21 unique values within Action.

In the new data set it is important that the new column (which counts a single text value from Action) is linked to the value of Q.Operatie (has values Q1, Q2, Q3, Q4) and Operatie (1:100).

Thus, if we take the first 4 rows in our example then we would have a column named Q1.Delegerend == 2. Whereas the next column would be Q1.Goedaardig == 1. Because we have 2 occurrences of Delegerend and 1 occurrence of Goedaardig for Operation == 1. I ignore Instruerend for this example.

This leads to 4 (Q1.X:Q4.X) columns with each having their unique count of text values within their respected range. Thus, Q1.Delegerend, Q2.Delegerend, Q3.Delegerend, Q4.Delegerend for the row of Operatie == 1 for Delegerend (one of 21 unique). We might need to set up 'transition-matrices', hence the split of the observations.

An example of the raw data is shown below, the new data set will have columns for each unique value in a single row, see the example below the raw data.

   Operatie Tijdstip Berekening.voor.D Minuut.van.de.Operatie Berekening.voor.F Q.Operatie Actor Responder        Action Focus InterTeam
1         1 08:44:56             00:00                      1                1%        Q1      C        OA    Delegerend     1         b
2         1 08:45:43             00:00                      2                2%        Q1     C*        AM    Goedaardig     1         a
3         1 08:46:45             00:01                      3                4%        Q1     OA       OA*   Instruerend     3          
4         1 08:47:10             00:02                      3                4%        Q1      C       OA*    Delegerend     1         b
5         1 08:48:03             00:03                      4                6%        Q1      C      Team  Onderwijzend     1         b
6         1 08:48:44             00:03                      5                7%        Q1      C      Team Bewustwording     1         b
7         1 08:49:28             00:04                      6                8%        Q1     C*         C   Instruerend     1         b
8         1 08:50:30             00:05                      7                9%        Q1      C        C*  Onderwijzend     1         b
9         1 08:50:47             00:05                      7               10%        Q1      C        AM    Delegerend     1         a
10        1 08:51:47             00:06                      8               11%        Q1      C        OA   Instruerend     1         b

Thus, ultimately, I'd like to have one row (Operatie) with 21 columns with a frequency of that unique text (taken from column Action), sorted by the levels of Q.Operatie. Yes, this will lead to a lot of columns, 21 unique values times 4. But that's fine.

   Operatie Minuten Chirurg1 Chirurg2 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend Q1.Goedaardig
1         1      72       10       11           2          4            5            5
2         2      30       10       11           2          2            6            12
3         3     102        1        2           1          5            12            ...
4         4     212        2       NA           3         13            13
5         5      37        4       NA           1          2            ...
6         6      57        2       NA           3          9
7         7     120        3       NA           1          9
8         8     146        3       NA           1          6
9         9     143        2        9           3         10
10       10     189        9        2           3         12

So I tried making a list for dplyr to work with, see below. I didn't manage to get it to work fluently. I am under the impression that it is possible to call a list to count unique values over, not sure how to write that up using dplyr. I looked at a few posts, but I couldn't find anything about counting multiple rows in order to migrate it to a new dataset. However, the latter is easy enough, I just need the columns.

my_list <- list(unique(sort(obs_IND$Action)))

obs_IND %>% 
count(my_list) %>%
group_by(Operatie) %>%
tally()

Used sources:

fleems
  • 109
  • 1
  • 1
  • 10
  • Why sort "by quartile" rather than by count? Are the quartiles based on the frequency of the strings? Why not just use table()? – Elin May 18 '19 at 12:23
  • As I understand your question, it involves only two columns in the original data frame. Please provide example data that includes only those two columns and the minimum number of row needed to produce a result. You say that you want to end up with 21 columns but your example result does not have 21 columns so that is puzzling. You don't say where the 21 is coming from. Please explain that. In an example if you would change the problem to require a smaller number (or larger only if you use `letters` to create the sample data) that would be good. – Elin May 18 '19 at 13:23
  • It is a bit difficult to understand the question. Could it be that you confused the words "columns" and "rows"? Operatie has 21 values; therefore the result has 21*4 rows right? – otwtm May 18 '19 at 23:22
  • @Elin Hope it's a bit more clear now, the trick is to transpose a bit. I am looking to create a single row (for each value of `Operatie`) that has the tally of all the text values. – fleems May 20 '19 at 10:06
  • .. that has the tally of all the **unique** text values. – fleems May 20 '19 at 10:37
  • One thing is that dplyr is not really designed for lists. – Elin May 21 '19 at 01:50

1 Answers1

1

I created some sample data:

operatie <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(operatie, Q, action)

library(dplyr)
library(tidyr)

We can group by operatie, Q and action, and then count the instance with tally().

df_long <- df %>% group_by(operatie, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)

Now we can use the function spread to create wide dataframe with columns for each combination of Q and action:

df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))

Result

  Q     operatie `action 1 Q1` `action 1 Q2` `action 2 Q1` ...
  <fct>    <dbl>         <dbl>         <dbl>         <dbl> ...
1 Q1           1             4             0             0 ...
2 Q1           1             0             0             2 ...
3 Q1           1             0             0             0 ...
4 Q1           1             0             0             0 ...
5 Q2           1             0             2             0 ...
6 Q2           1             0             0             0 ...
...
otwtm
  • 1,779
  • 1
  • 16
  • 27