Get frequency of each p. language of each country

Question

I'm working with following dataset:

  | Country          | HaveWorkedLanguage
1 | United States    | Swift 
2 | United States    | Python
3 | Austria          | JavaScript 
4 | Austria          | JavaScript
5 | United States    | Swift

I'd like to sum up all the Programming Languages. The output should look like this:

  | Country          | HaveWorkedLanguage  | Frequency
1 | United States    | Swift               |     2
2 | United States    | Python              |     1      
3 | Austria          | JavaScript          |     2

I already played around with table() but couldn't make it right.

This is a duplicate question, but since the answers in the linked dupe dont have a `table` method a possibility would be: `res <- setNames(melt(table(df$country, df$haveWorkedLanguage)), c("country", "haveWorkedLanguage", "frequency")); res[res$frequency>0,]`. Note the answers in the linked question are much more straightforward. This is just out of curiousity — Mike H., Dec 29 '17 at 17:43

score 1 · Answer 1 · answered Dec 29 '17 at 17:28

1

Using dplyr library

 df %>% group_by(Country,HaveWorkedLanguage) %>%
           dplyr::summarize(Frequency=n()) %>%
           as.data.frame()

answered Dec 29 '17 at 17:28

Hugo Silva

199
9

sm925 · Accepted Answer · 2017-12-29T17:39:00.953

Using data.table, you do a count and group by and then do a unique:-

df <- data.table(Country = c("United States", "United States", "Austria", "Austria", "United States"), HaveWorkedLanguage = c("Swift", "Python", "JavaScript", "JavaScript", "Swift"))
df[, Frequency := .N, by = c("Country", "HaveWorkedLanguage")]
df <- unique(df)

It'll give you the desired output:-

   Country         HaveWorkedLanguage Frequency
1: United States   Swift               2
2: United States   Python              1
3: Austria         JavaScript          2

score 0 · Answer 3 · answered Dec 29 '17 at 17:36

Using 'dplyr' makes this an intuitive process. First 'group_by' the thing you want to summarize and then perform the summary as follows:

library(dplyr)

df <- tibble(country = c('United States', 'United States', 'Austria', 'Austria', 'United States'),
         haveworkedlang = c('Swift', 'Python', 'JavaScript', 'JavaScript', 'Swift'))

df %>%
  group_by(haveworkedlang) %>%
  summarize(n())

Get frequency of each p. language of each country

3 Answers3