Spread returns duplicate identifier error even with unique rows included

Question

I have the following dataframe:

       location asset_status count   row
      <chr>          <chr>  <dbl>  <int>
 1  location1        Owned     1     1
 2  location1    Available     1     2
 3  location1        Owned     1     3
 4  location2        Owned     1     4
 5  location2        Owned     1     5
 6  location2        Owned     1     6
 7  location2        Owned     1     7
 8  location2    no status     1     8
 9  location3        Owned     1     9
10  location3        Owned     1    10

When I try to spread using this, i get the following error:

df <- head(us_can_laptops,10) %>% 
  select(location,asset_status,count) %>% 
  #mutate(row = row_number()) %>% #excluded
  group_by(location) %>% 
  spread(asset_status,count)

Error: Duplicate identifiers for rows (4, 5, 6, 7), (1, 3)

So as per the other questions related to this on SO, I added in a unique identifier with mutate:

df <- head(us_can_laptops,10) %>% 
  select(location,asset_status,count) %>% 
  mutate(row = row_number()) %>%
  group_by(location) %>% 
  spread(asset_status,count)

But that returns this:

    location     row   Available   `no status` Owned
 *        <chr> <int>     <dbl>       <dbl> <dbl>
 1  location2     4        NA          NA     1
 2  location2     5        NA          NA     1
 3  location2     6        NA          NA     1
 4  location2     7        NA          NA     1
 5  location2     8        NA           1    NA
 6  location3    10        NA          NA     1
 7  location3     9        NA          NA     1
 8  location1     1        NA          NA     1
 9  location1     2         1          NA    NA
10  location1     3        NA          NA     1

And additionally, whenever I try a summarize call, it ruins my spread.

Here is the desired outcome:

 location        Available   `no status` Owned
 *      <chr>     <dbl>       <dbl>      <dbl>
 1  location1        1          NA        2
 2  location2       NA          1         4
 3  location3       NA          NA        2

Any help would be appreciated. I know this looks like a duplicate, but the answers to the following linked questions still don't resolve the issue for me: Spread function Error: Duplicate identifiers for rows [duplicate] Spread with duplicate identifiers for rows 1

I'm really looking for a solution while using dplyr, and not dcast

In the expected output where does the value of 4 for row in the first line come from ? — moodymudskipper, Jan 22 '18 at 17:24
oh disregard the row column please. that's just the counter. i will remove it in the actual code. — zabada, Jan 22 '18 at 17:30

score 2 · Accepted Answer · answered Jan 22 '18 at 17:38

2

The following should work (at least gives desired output):

df <- structure(list(location = c("location1", "location1", "location1", 
                                  "location2", "location2", "location2", "location2", "location2", 
                                  "location3", "location3"), asset_status = c("Owned", "Available", 
                                                                              "Owned", "Owned", "Owned", "Owned", "Owned", "no status", "Owned", 
                                                                              "Owned"), count = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
                     row = 1:10), row.names = c(NA, -10L), .Names = c("location", 
                                                                      "asset_status", "count", "row"), class = "data.frame")

library(dplyr)
library(tidyr)
df %>% 
  group_by(location, asset_status) %>% 
  summarise(count = sum(count)) %>% 
  spread(key = asset_status, value = count)

answered Jan 22 '18 at 17:38

Tino

2,091
13
15

2

I was posting the same, with `summarize_at("count",sum)` instead, not to repeat the variable name :) – moodymudskipper Jan 22 '18 at 17:42
I do often forget about `summarise_at` when just having to apply a function to only one variable and also have a slight preference for the `summarise`-approach because *I* find it more readable...but maybe that's just me. But I really like combining `summarise_at` in combination with some regex using `vars(matches(...))` :-). – Tino Jan 22 '18 at 17:50
1

Thank you @Tino and Moody_Mudskipper . That worked perfectly. I guess I was overcomplicating the code. – zabada Jan 22 '18 at 17:51
1

Yes it's just a matter of taste here, I just took the opportunity to advertise the function. – moodymudskipper Jan 22 '18 at 22:39

Spread returns duplicate identifier error even with unique rows included

1 Answers1