1

I have a data frame in R that includes city names, years, and months. It looks like the following:

Sample data set with Month, Year, and City columns

This table continues for thousands of records. Some cities do not have data for every month in the year (e.g. the 1920 data for Los Angeles in the example above only contains January and February), meaning they are incomplete. I want to extract out only those years which are complete for a given city (e.g. contains all 12 months for that year, like Toronto in the above example).

I have tried converting it into the zoo::yearmon class, but I do not know how to manipulate it to do what I have described above. I believe that a script could be written that looks at the year and city name, checks if it contains all 12 months, and then omits years that do not.

Matt
  • 11
  • 1
  • 1
    Please share your data using `dput()`. [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Tung Apr 03 '18 at 00:08
  • Can we assume that for `City` + `Year`, a `Month` doesn't occur more than once? For example there is only one Jan, 1922, Toronto? – neilfws Apr 03 '18 at 00:13

2 Answers2

2

Here's a solution using the dplyr package:

df %>% 
  group_by(City, Year) %>% 
  filter(length(unique(Month)) == 12)

I group by City and Year and then filter for those with 12 unique months. (I assume your data frame is called df.)


Now, if you just want a particular city, say Toronto, you could use the following:

df %>% 
  filter(City == "Toronto") %>%
  group_by(Year) %>% 
  filter(length(unique(Month)) == 12)
Dan
  • 11,370
  • 4
  • 43
  • 68
0

Here is an option using data.table

library(data.table)
setDT(df)[, .SD[uniqueN(Month)==12], .(City, Year)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This also worked. Thanks for taking the time to answer! I'm not sure which approach is better though. – Matt Apr 03 '18 at 20:19