R count observations in two groups - shorter solution?

Question

I have data about observations across different locations and many years.

location year  variable  dataentry
1         1970  A        288
1         1970  A        281
1         1970  B        282
2         1970  A        282
2         1971  B        284
2         1971  B        287

I want know how many locations contributed data in each year, looking like this:

year  NumberOfLocations
1970  2
1971  1

The column with "variables" and "dataentry" are not important. they only indicate that there were data entries.

I think I made it work by using group_by and summarise:

d1 <- data %>% group_by(location, year) %>% summarise(da = mean(dataentry)) 
d2 <- d1 %>% count(location, year)
d3 <- d2 %>% group_by(year) %>% summarise(NumberOfLocations = sum(n))

But is there a more elegant way to do it?

Jon Spring · Accepted Answer · 2023-03-10T18:15:28.200

4

With dplyr 1.1.0:

data %>%
  summarize(num_locations = n_distinct(location), .by = year)

Or alternatively (which works for older dplyr too)

data %>%
  group_by(year) %>%
  summarize(num_locations = n_distinct(location))

data %>%
  distinct(year, location) %>%  
  count(year, name = "num_locations")

edited Mar 10 '23 at 18:15

answered Mar 10 '23 at 18:06

Jon Spring

55,165
4
35
53

S-SHAAF · Answer 2 · 2023-03-10T18:49:45.943

2

library(tidyverse)

data %>% group_by(year) %>%        
 summarize(NumberOfLocations=length(unique(location)))
       
       year    NumberOfLocations 
      <int>    <int>
    1  1970     2
    2  1971     1

edited Mar 10 '23 at 18:49

answered Mar 10 '23 at 18:07

S-SHAAF

1,863
2
5
14

score 1 · Answer 3 · answered Mar 10 '23 at 18:14

1

An alternative dplyr:

library(dplyr) 
df %>%
  summarize(num_locations = length(unique(location)), .by=year)

  year num_locations
1 1970             2
2 1971             1

answered Mar 10 '23 at 18:14

TarJae

72,363
6
19
66

R count observations in two groups - shorter solution?

3 Answers3