1

I have data about observations across different locations and many years.

location year  variable  dataentry
1         1970  A        288
1         1970  A        281
1         1970  B        282
2         1970  A        282
2         1971  B        284
2         1971  B        287

I want know how many locations contributed data in each year, looking like this:

year  NumberOfLocations
1970  2
1971  1

The column with "variables" and "dataentry" are not important. they only indicate that there were data entries.

I think I made it work by using group_by and summarise:

d1 <- data %>% group_by(location, year) %>% summarise(da = mean(dataentry)) 
d2 <- d1 %>% count(location, year)
d3 <- d2 %>% group_by(year) %>% summarise(NumberOfLocations = sum(n))

But is there a more elegant way to do it?

Johanna
  • 125
  • 5

3 Answers3

4

With dplyr 1.1.0:

data %>%
  summarize(num_locations = n_distinct(location), .by = year)

Or alternatively (which works for older dplyr too)

data %>%
  group_by(year) %>%
  summarize(num_locations = n_distinct(location))

data %>%
  distinct(year, location) %>%  
  count(year, name = "num_locations")
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
2
library(tidyverse)

data %>% group_by(year) %>%        
 summarize(NumberOfLocations=length(unique(location)))
       
       year    NumberOfLocations 
      <int>    <int>
    1  1970     2
    2  1971     1
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14
1

An alternative dplyr:

library(dplyr) 
df %>%
  summarize(num_locations = length(unique(location)), .by=year)
  year num_locations
1 1970             2
2 1971             1
TarJae
  • 72,363
  • 6
  • 19
  • 66