0

I run the following code few months back and it worked OK -

ceo1_nochange <- ceo1 %>% 
  group_by(ISIN, year) %>% 
  nest(.key = "OTHER_DATA") %>% 
  group_by(ISIN) %>% 
  mutate(OTHER_DATA_LAG = lag(OTHER_DATA, 1), 
         OTHER_DATA_LEAD = lead(OTHER_DATA, 1), 
         KEEP = pmap(list(OTHER_DATA_LAG, OTHER_DATA, OTHER_DATA_LEAD), function(x, y, z) {
           isTRUE(all_equal(x["DirectorID"], y["DirectorID"])) ||
             isTRUE(all_equal(y["DirectorID"], z["DirectorID"]))
         })) %>% 
  filter(unlist(KEEP)) %>% 
  select(-OTHER_DATA_LAG, -OTHER_DATA_LEAD, -KEEP) %>% 
  unnest() %>% 
  ungroup()

My purpose was to identify those observations in which DirectorID did not change from year to year.

But now I got the following error -

Error: Problem with `mutate()` input `KEEP`.
x argument is of length zero
i Input `KEEP` is `pmap(...)`.
i The error occurred in group 1: ISIN = "AN8068571086".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
 Error: Problem with `mutate()` input `KEEP`.
x argument is of length zero
i Input `KEEP` is `pmap(...)`.
i The error occurred in group 1: ISIN = "AN8068571086".
Run `rlang::last_error()` to see where the error occurred.

Can anybody shed some light?

This is a sample dataset -

"ROW,ISIN,YEAR,DIRECTOR_NAME,DIRECTOR_ID
1,US9898171015,2006,Thomas (Tom) E Davin,2247441792
2,US9898171015,2006,Matthew (Matt) L Hyde,4842568996
3,US9898171015,2007,James (Jim) M Weber,3581636766
4,US9898171015,2007,Matthew (Matt) L Hyde,4842568996
5,US9898171015,2007,David (Dave) M DeMattei,759047198
6,US9898171015,2008,James (Jim) M Weber,3581636766
7,US9898171015,2008,Matthew (Matt) L Hyde,4842568996
8,US9898171015,2008,David (Dave) M DeMattei,759047198
9,US9898171015,2009,William (Bill) Milroy Barnum Jr,20462211719
10,US9898171015,2009,James (Jim) M Weber,3581636766
11,US9898171015,2009,Matthew (Matt) L Hyde,4842568996
12,US9898171015,2009,David (Dave) M DeMattei,759047198
13,US9898171015,2010,William (Bill) Milroy Barnum Jr,20462211719
14,US9898171015,2010,James (Jim) M Weber,3581636766
15,US9898171015,2010,Matthew (Matt) L Hyde,4842568996
16,US9898171015,2011,Sarah (Sally) Gaines McCoy,11434863691
17,US9898171015,2011,William (Bill) Milroy Barnum Jr,20462211719
18,US9898171015,2011,James (Jim) M Weber,3581636766
19,US9898171015,2011,Matthew (Matt) L Hyde,4842568996
20,US9898171015,2012,Sarah (Sally) Gaines McCoy,11434863691
21,US9898171015,2012,Ernest R Johnson,40425210975
22,US9898171015,2013,Sarah (Sally) Gaines McCoy,11434863691
23,US9898171015,2013,Ernest R Johnson,40425210975
24,US9898171015,2013,Travis D Smith,53006212569
25,US9898171015,2014,Sarah (Sally) Gaines McCoy,11434863691
26,US9898171015,2014,Ernest R Johnson,40425210975
27,US9898171015,2014,Travis D Smith,53006212569
28,US9898171015,2015,Kalen F Holmes,11051172801
29,US9898171015,2015,Sarah (Sally) Gaines McCoy,11434863691
30,US9898171015,2015,Ernest R Johnson,40425210975
31,US9898171015,2015,Travis D Smith,53006212569
32,US9898171015,2016,Sarah (Sally) Gaines McCoy,11434863691
33,US9898171015,2016,Ernest R Johnson,40425210975
34,US9898171015,2016,Travis D Smith,53006212569
35,US9898171015,2017,Sarah (Sally) Gaines McCoy,11434863691
36,US9898171015,2017,Scott Andrew Bailey,174000000000
37,US9898171015,2017,Ernest R Johnson,40425210975
38,US9898171015,2017,Travis D Smith,53006212569
" 

can someone provide some clue?

Sharif
  • 163
  • 1
  • 9

1 Answers1

0

I didn't find anything in the code which might be affected due to any recent changes. The reason why you are getting the error is because of lag and lead functions. When you use them on dataframe it creates NULL values at the beginning and end respectively. If you put that check in pmap statement it should work.

I did some other changes in the code as well -

  • .key has been deprecated in nest so used nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID) instead.
  • Used pmap_lgl (instead of pmap) so that you don't have to do unlist(KEEP) in filter.
  • unnest needs an explicit mention of column name to unnest so used unnest(cols = c(OTHER_DATA)).
library(tidyverse)

ceo1 %>% 
  group_by(ISIN, YEAR) %>% 
  nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID)) %>% 
  group_by(ISIN) %>% 
  mutate(OTHER_DATA_LAG = lag(OTHER_DATA, 1), 
         OTHER_DATA_LEAD = lead(OTHER_DATA, 1),
         KEEP = pmap_lgl(list(OTHER_DATA_LAG, OTHER_DATA, OTHER_DATA_LEAD), function(x, y, z) {
           if(length(x) > 0 && length(y) > 0 && length(z) > 0)
                isTRUE(all_equal(x["DIRECTOR_ID"], y["DIRECTOR_ID"])) ||
                isTRUE(all_equal(y["DIRECTOR_ID"], z["DIRECTOR_ID"]))
           else FALSE
         })) %>% 
  filter(KEEP) %>% 
  select(-OTHER_DATA_LAG, -OTHER_DATA_LEAD, -KEEP) %>% 
  unnest(cols = c(OTHER_DATA)) %>% 
  ungroup()

#   ISIN          YEAR   ROW DIRECTOR_NAME              DIRECTOR_ID
#   <chr>        <int> <int> <chr>                            <dbl>
# 1 US9898171015  2007     3 James (Jim) M Weber         3581636766
# 2 US9898171015  2007     4 Matthew (Matt) L Hyde       4842568996
# 3 US9898171015  2007     5 David (Dave) M DeMattei      759047198
# 4 US9898171015  2008     6 James (Jim) M Weber         3581636766
# 5 US9898171015  2008     7 Matthew (Matt) L Hyde       4842568996
# 6 US9898171015  2008     8 David (Dave) M DeMattei      759047198
# 7 US9898171015  2013    22 Sarah (Sally) Gaines McCoy 11434863691
# 8 US9898171015  2013    23 Ernest R Johnson           40425210975
# 9 US9898171015  2013    24 Travis D Smith             53006212569
#10 US9898171015  2014    25 Sarah (Sally) Gaines McCoy 11434863691
#11 US9898171015  2014    26 Ernest R Johnson           40425210975
#12 US9898171015  2014    27 Travis D Smith             53006212569
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks If I want to keep only those DirectorID that remain same year to year, where i should change code?. For example - if we look at the output of your final code, we can see year 2007 and 2008 was kept because there was no change of DirectorID in those year for ISIN US9898171015, but if we look at my sample data, we can see DirectorID - 3581636766 (James (Jim) M Weber), 4842568996 (Matthew (Matt) L Hyde, 759047198 (David (Dave) M DeMattei) were in both 2008 and 2009. 2009 was dropped because a new DirectorID was in 2009. Can I keep all DirectorID that remain same year to year. – Sharif Mar 02 '21 at 15:46
  • I see that you asked a new question for that and received an answer for it. – Ronak Shah Mar 02 '21 at 21:41
  • When I run your code that I accepted as answer, my observations are little bit different from my old code. This little difference changes my results. It seems to me that if I can run my old code, it would be great. Do you have any idea how can I run the old code? – Sharif Mar 06 '21 at 14:29
  • As I mentioned in the answer I couldn't find anything in your code that might be affected by any recent change in the libraries. However, I might be wrong as well since `tidyverse` has been changing a lot in recent times. – Ronak Shah Mar 06 '21 at 14:34
  • Got it. Thanks for all your help. – Sharif Mar 06 '21 at 14:36