Create a new variable based on modal number of observations

Question

I am trying to create a new variable that is based on the model number of observations of a separate variable.

Using this df:

help <- data.frame(
id = c(rep(05, times = 8), rep(10, times = 8), rep(12, times = 8)),
episode = c(rep(1, times = 4), rep(2, times =4), rep(3, times = 8), rep(1, times = 4), rep(2, times =4)),
provider = c(rep(70, times = 2), rep(80, times = 2), rep(70, times = 4), rep(30, times = 6), rep(40, times = 2), rep(70, times = 4), rep(10, times = 4)))

I am hoping to create a new variable, provider_mode that is based on the modal provider, or provider with the most observations, per episode.

The end df would look like this:

id episode provider provider_mode
5       1       70       70
5       1       70       70
5       1       80       70
5       1       80       70
5       2       70       70 
5       2       70       70
5       2       70       70
5       2       70       70
10      3       30       30
10      3       30       30
10      3       30       30
10      3       30       30
10      3       30       30
10      3       30       30
10      3       40       30
10      3       40       30
12      1       70       70
12      1       70       70
12      1       70       70
12      1       70       70
12      2       10       10
12      2       10       10
12      2       10       10
12      2       10       10

Here is the code I came up with thus far, but it only gives me the count for each provider within each episode. I need to create a mutate command that puts the provider with the most observations, and if there is a tie, select the first provider (e.g., provider 70 within id 5).

help %>% group_by(id, episode, provider) %>% mutate(provider_count = n())

MrFlick · Answer 1 · 2017-05-17T03:52:02.003

2

The problem is R doesn't really have a mode() function that calculates the statistical mode. We can fake it with dplyr() by calculating the counts and then merging them back in. For example if your data is stored in a variable named dd,

dd %>% 
    group_by(id, episode) %>% 
    count(provider) %>% 
    top_n(1, n) %>% 
    filter(row_number()==1) %>%
    select(-n) %>% 
    rename(provider_mode=provider) %>% 
    right_join(dd)

edited May 17 '17 at 03:52

answered May 17 '17 at 02:16

MrFlick

195,160
17
277
295

I didn't realize `top_n` would return multiple rows in case of ties. I've added an additional filter to return just one value per id. – MrFlick May 17 '17 at 02:41
this works as long as I due `group_by(id, episode)` ... if I don't' group by episode as well, the provider for id 12 stays the same across both episodes. thanks, flick. – b222 May 17 '17 at 02:46

score 2 · Accepted Answer · answered May 17 '17 at 02:20

With dplyr we can group by id and episode and using table we find the frequency of each provider and select the maximum of it with which.max.

library(dplyr)
help %>%
  group_by(id, episode) %>%
  mutate(provider_mode = names(which.max(table(provider))))

#     id  episode provider provider_mode
#   <dbl>   <dbl>    <dbl>         <chr>
#1      5       1       70            70
#2      5       1       70            70
#3      5       1       80            70
#4      5       1       80            70
#5      5       2       70            70
#6      5       2       70            70
#7      5       2       70            70
#8      5       2       70            70
#9     10       3       30            30
#10    10       3       30            30
# ... with 14 more rows

And the similar could be achieved with ave from base R

help$provider_mode <- ave(help$provider, help$id, help$episode, FUN = function(x) 
                                      names(which.max(table(x))))

score 1 · Answer 3 · edited May 23 '17 at 11:55

1

We can create a Mode function as showed here

Mode <- function(x) {
 ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]

}

and then with data.table, grouped by 'id', 'episode', we get the 'Mode' of 'provider'

library(data.table)
setDT(help)[, provider_mode := Mode(provider), by = .(id, episode)]

edited May 23 '17 at 11:55

Community

1
1

answered May 17 '17 at 03:01

akrun

874,273
37
540
662

Create a new variable based on modal number of observations

3 Answers3