0

I'm working with following dataset:

  | Country          | HaveWorkedLanguage
1 | United States    | Swift 
2 | United States    | Python
3 | Austria          | JavaScript 
4 | Austria          | JavaScript
5 | United States    | Swift

I'd like to sum up all the Programming Languages. The output should look like this:

  | Country          | HaveWorkedLanguage  | Frequency
1 | United States    | Swift               |     2
2 | United States    | Python              |     1      
3 | Austria          | JavaScript          |     2

I already played around with table() but couldn't make it right.

sfjac
  • 7,119
  • 5
  • 45
  • 69
  • look for "group by" and "count" – MichaelChirico Dec 29 '17 at 17:27
  • This is a duplicate question, but since the answers in the linked dupe dont have a `table` method a possibility would be: `res <- setNames(melt(table(df$country, df$haveWorkedLanguage)), c("country", "haveWorkedLanguage", "frequency")); res[res$frequency>0,]`. Note the answers in the linked question are much more straightforward. This is just out of curiousity – Mike H. Dec 29 '17 at 17:43

3 Answers3

1

Using dplyr library

 df %>% group_by(Country,HaveWorkedLanguage) %>%
           dplyr::summarize(Frequency=n()) %>%
           as.data.frame()
Hugo Silva
  • 199
  • 9
0

Using data.table, you do a count and group by and then do a unique:-

df <- data.table(Country = c("United States", "United States", "Austria", "Austria", "United States"), HaveWorkedLanguage = c("Swift", "Python", "JavaScript", "JavaScript", "Swift"))
df[, Frequency := .N, by = c("Country", "HaveWorkedLanguage")]
df <- unique(df)

It'll give you the desired output:-

   Country         HaveWorkedLanguage Frequency
1: United States   Swift               2
2: United States   Python              1
3: Austria         JavaScript          2
sm925
  • 2,648
  • 1
  • 16
  • 28
0

Using 'dplyr' makes this an intuitive process. First 'group_by' the thing you want to summarize and then perform the summary as follows:

library(dplyr)

df <- tibble(country = c('United States', 'United States', 'Austria', 'Austria', 'United States'),
         haveworkedlang = c('Swift', 'Python', 'JavaScript', 'JavaScript', 'Swift'))

df %>%
  group_by(haveworkedlang) %>%
  summarize(n())
AlphaDrivers
  • 136
  • 4