-1

I am trying to learn how to "count by multiple groups" in R using the dplyr library. I generated some data, and now I want to count the number of people for each combination of city and country.

Can someone please tell me if the code I have written is correct?

library(dplyr)

Data_I_Have <- data.frame(
    
    "Country" = c("USA", "USA", "USA", "SPAIN", "SPAIN", "SPAIN", "SPAIN", "SPAIN", "SPAIN", "FRANCE", "UK"),
    "City" = c("Chicago", "Chicago", "Boston", "Madrid", "Madrid", "Madrid", "Barcelona", "Barcelona", "NA", "Paris", "London"),
    " Person" = c("John", "John", "Jim", "Jeff", "Joseph", "Jason", "Justin", "Jake", "Joe", "Jaccob", "Jon")
)

summary = Data_I_Have %>%
    dplyr::group_by(Country, City)%>%
    dplyr::summarise(COUNT = n())

summary = data.frame(summary)

Suppose if I had wanted to count the number of distinct names, is this code correct?

summary_2 = Data_I_Have %>%
    dplyr::group_by(Country,City)%>%
    dplyr::summarise(UNIQUE_COUNT = n())

Is this correct as well?

Thanks

stats_noob
  • 5,401
  • 4
  • 27
  • 83

2 Answers2

1

Try this:

library(dplyr)
#Code
Data_I_Have %>%
    dplyr::group_by(Country,City)%>%
    dplyr::summarise(UNIQUE_COUNT = n_distinct(` Person`))

With n() you will get the number of observations per group:

# A tibble: 7 x 3
# Groups:   Country [4]
  Country City      UNIQUE_COUNT
  <chr>   <chr>            <int>
1 FRANCE  Paris                1
2 SPAIN   Barcelona            2
3 SPAIN   Madrid               3
4 SPAIN   NA                   1
5 UK      London               1
6 USA     Boston               1
7 USA     Chicago              2

Whereas, with n_distinct() you will get the number of unique observations:

# A tibble: 7 x 3
# Groups:   Country [4]
  Country City      UNIQUE_COUNT
  <chr>   <chr>            <int>
1 FRANCE  Paris                1
2 SPAIN   Barcelona            2
3 SPAIN   Madrid               3
4 SPAIN   NA                   1
5 UK      London               1
6 USA     Boston               1
7 USA     Chicago              1
Duck
  • 39,058
  • 13
  • 42
  • 84
1

Is this what you're looking for?

library(tidyverse)
Data_I_Have <- data.frame(
  
  "Country" = c("USA", "USA", "USA", "SPAIN", "SPAIN", "SPAIN", "SPAIN", "SPAIN", "SPAIN", "FRANCE", "UK"),
  "City" = c("Chicago", "Chicago", "Boston", "Madrid", "Madrid", "Madrid", "Barcelona", "Barcelona", "NA", "Paris", "London"),
  "Person" = c("John", "John", "Jim", "Jeff", "Joseph", "Jason", "Justin", "Jake", "Joe", "Jaccob", "Jon")
)

Data_I_Have
#>    Country      City Person
#> 1      USA   Chicago   John
#> 2      USA   Chicago   John
#> 3      USA    Boston    Jim
#> 4    SPAIN    Madrid   Jeff
#> 5    SPAIN    Madrid Joseph
#> 6    SPAIN    Madrid  Jason
#> 7    SPAIN Barcelona Justin
#> 8    SPAIN Barcelona   Jake
#> 9    SPAIN        NA    Joe
#> 10  FRANCE     Paris Jaccob
#> 11      UK    London    Jon

Data_I_Have %>%
  distinct(Country, City, Person) %>% 
  group_by(Country, City) %>% 
  summarise(n_uniuqe_names=length(Person))
#> `summarise()` regrouping output by 'Country' (override with `.groups` argument)
#> # A tibble: 7 x 3
#> # Groups:   Country [4]
#>   Country City      n_uniuqe_names
#>   <chr>   <chr>              <int>
#> 1 FRANCE  Paris                  1
#> 2 SPAIN   Barcelona              2
#> 3 SPAIN   Madrid                 3
#> 4 SPAIN   NA                     1
#> 5 UK      London                 1
#> 6 USA     Boston                 1
#> 7 USA     Chicago                1

Created on 2020-12-02 by the reprex package (v0.3.0)

zoowalk
  • 2,018
  • 20
  • 33