How to aggregate factor variable?

Question

I have a dataframe which looks like the following:

ID    month    country   count    style
1     2012-02  UK        3        high
1     2012-02  US        10       high
1     2012-02  FR        5        high
etc

Now, I want to aggregate the values over the ID and country variable and, therefore, I use:

aggregated_data = setDT(subset)[, .(Country = list(Country), ID = min(ID), 
count = sum(count), by = list(Model, Month)][]

To get

ID    month    country     count    
1     2012-02  UK, US, FR   18      
etc

However, since my style variable is a factor I do not know how I can incorporate this in the aggregated table. The values of the factor variable is always the same for one ID, so I only need to print the first value of the style variable for the style variable in the aggregated table. Does anyone know how to do this?

Your expected output is ambiguous, what exactly is counted in your resulting table? BTW, have you ever concerned about how to [properly provide data on Stack Overflow?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). You could do this in a better way than you've done before. — jay.sf, Feb 18 '19 at 09:56

Maurits Evers · Accepted Answer · 2019-02-18T09:40:10.260

You could just use unique, e.g.

df <- setDT(df)
df[, .(country = toString(country), count = sum(count), style = unique(style)), by = list(ID, month)]
#   ID   month    country count style
#1:  1 2012-02 UK, US, FR    18  high

Or using dplyr

df %>%
    group_by(ID, month) %>%
    summarise(
        country = toString(country),
        count = sum(count),
        style = unique(style))
## A tibble: 1 x 5
## Groups:   ID [?]
#     ID month   country    count style
#  <int> <fct>   <chr>      <int> <fct>
#1     1 2012-02 UK, US, FR    18 high

Both methods assume that style is always the same per ID and month.

Sample data

df <- read.table(text =
    "ID    month    country   count    style
1     2012-02  UK        3        high
1     2012-02  US        10       high
1     2012-02  FR        5        high", header = T)

How to aggregate factor variable?

1 Answers1

Sample data