Spread won't work? Error: Each row of output must be identified by a unique combination of keys

Question

I'm having an issue trying to do something that I feel is quite simple.

I have a data set like this, the years span 1998 to 2017, states are states in Brazil, then it lists the amount of fires per month:

year state month     number
<int> <fct> <fct>      <dbl>
1998 Acre  January        0
1998 Acre  February       0
1998 Acre  March          0
1998 Acre  April          0
1998 Acre  May            0
1998 Acre  June           3
1998 Acre  July          37
1998 Acre  August       130
1998 Acre  September    509
1998 Acre  October       44

and I want to change it to wide format so it lists the year and the total count for the year without information about the month and state, so that I can make a graph of the trend over the period. I'm envisioning data like this:

year number
1998 (count)
1999 (count)
2000 (count)

I haven't used R for a little while and am a bit rusty on data manipulation, I've tried a bunch of things like using spread() and group_by() together, but I keep getting the error:

Error: Each row of output must be identified by a unique combination of keys.

Any ideas as to how I can go about this? Sorry for the super simple question!

Thanks in advance!

Couple things: that's a common error that comes from having duplicates in your data so it isn't clear which rows should translate to which columns. Without seeing your code, however, it's unclear what you did and how you got that error. I'm not sure why you need `spread` to begin with, since what you're showing as your desired output wouldn't require it — camille, Oct 22 '19 at 01:53
This is the code that threw the error: fireCount1998_2017 %>% spread(year, number) — Jams, Oct 22 '19 at 01:58
I was using this during some trial and error on how to achieve the desired outcome I outline in the question space. I'm aware that this approach may not be required to achieve the outcome, but though it was worth mentioning what I had already tried. Any ideas on how I can achieve my outcome? — Jams, Oct 22 '19 at 02:01
If there's more code or explanation to include, you can [edit] it into the question. You probably just want to combine a call to `group_by` with a call to `summarise`, but it's unclear exactly since the data you posted only includes one year and one state. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that's easy to help with — camille, Oct 22 '19 at 02:03

score 1 · Accepted Answer · answered Oct 22 '19 at 02:06

Suppose your data is something like this:

x <- tibble::tribble(
  ~year, ~state,   ~month, ~number,
   1998, "Acre",   "January", 0,
   1998, "Acre",  "February", 0,
   1998, "Acre",     "March", 0,
   1998, "Acre",     "April", 0,
   1998, "Acre",       "May", 0,
   1998, "Acre",      "June", 3,
   1998, "Acre",      "July", 37,
   1998, "Acre",    "August", 130,
   1998, "Acre", "September", 509,
   1998, "Acre",   "October", 44
  )

Than you could use:

x %>% dplyr::group_by(year) %>% dplyr::summarise(number = sum(number))

Thanks! This worked, I knew I was possibly thinking about it the wrong way — Jams, Oct 22 '19 at 03:31

Spread won't work? Error: Each row of output must be identified by a unique combination of keys

1 Answers1