Error when trying to take the difference between observations within a group

Question

I have groups of 20 where only one observation has the value of one on a variable and I am trying to transform the other variables so that the remaining 19 observations and I am getting the following error.

"longer object length is not a multiple of shorter object length"

library(dplyr)

test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "shoe", "shoe", 
"shoe", "boat", "boat","boat","boat","boat","boat", "ship", "ship", "ship", 
"ship", "ship", "ship"), 
           'seller_id'= c("a", "b", "c", "d", "e", "f", "a","g", "h", "r", 
"q", "b", "qe", "dj", "d3", "kk", "dn", "de"), 
           'Dich'= c(1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
           'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44, 32, 
21, 44, 54, 55, 33)
            )

Interestingly this code works:

test2 <- test %>% 
     group_by(prod_id) %>%
     mutate(price_diff = if(any(Dich ==1)) ((price - price[Dich == 
     1])/(price + price[Dich == 1])/2) else NA)

While this code

test2 <- test %>% 
     group_by(prod_id) %>%
     mutate(diff_p = if(any(Dich==1)) price - price[Dich == 1] else NA)

is giving me the "longer object length is not a multiple of shorter object length" error. Unfortunately I wasn't able to reproduce in the example data so I'm hoping someone can see what the issue is.

I saw this post

Longer object length is not a multiple of shorter object length?

but the objects are both the same number of rows and I'm not sure why the one syntax would work while just slightly changing the transformation give me the error.

For the example posted, it is not showing any error or warning. You may have multiple `Dich` values as 1 for each 'prod_id' in the original data. In that case, you may have to rethink about the strategy. Try `if(any(Dich==1)) price - price[which(Dich == 1)[1]] else NA` whether it works. In that case, it is the multiple 1s per group — akrun, Jan 06 '19 at 02:09
Hey @akrun thanks for the reply do you have an example of logic code I can run to see if there are multiple 1's for dich? My understanding of the data is that this is impossible but I have so many cases its not easy to browse. — Kreitz Gigs, Jan 06 '19 at 02:12
Just check the frequency with `table(test[c('prod_id', 'Dich')])` and see if there are more than 1 1s for each prod_id — akrun, Jan 06 '19 at 02:14
Ah yeah I did find a couple groups with more than 1 which is really odd for this data. I'm guessing these are duplicates. Thanks so much for the help! — Kreitz Gigs, Jan 06 '19 at 02:17
There were duplicates but there were also groups with 2 dich thanks for the solution of using the first row in the group to subtract from! If you want to write it as an answer I'll "accept" it. Thanks! — Kreitz Gigs, Jan 06 '19 at 02:38

score 1 · Accepted Answer · answered Jan 06 '19 at 02:45

The error happens when there are more than one 'Dich' value that is 1 for some 'prod_id'. If there is only 1 'Dich', then the mutate will recycle the corresponding 'price' difference based on that single 'Dich' but with more than one 'Dich', the recycling becomes problematic and mutate needs to output the column or new column with the same number of rows as the original dataset.

So, if the strategy is to get the 'price' for the first occurrence of 'Dich' with value 1, then either use which and subset the first position to extract the 'price'

test %>% 
  group_by(prod_id) %>%
  mutate(diff_p = if(any(Dich==1)) price - price[which(Dich == 1)[1]] else NA)

Or use which.max

test %>% 
  group_by(prod_id) %>%
  mutate(diff_p = if(any(Dich==1)) price - price[which.max(Dich == 1)] else NA)

Or using match

test %>% 
  group_by(prod_id) %>%
  mutate(diff_p = if(any(Dich==1)) price - price[match(1, Dich)] else NA)

Error when trying to take the difference between observations within a group

1 Answers1