1

My dataset consists of housing property IDs and their owners over the years. There is a column for year, property, owner 1 and owner 2.

Sometimes the owners change completely between years e.g. when someone sells their property to someone else.

However, sometimes, the owner 1 stays the same, but an owner 2 is added (e.g. joint ownership because of marriage. Other times owner 1 stays the same but owner 2 is removed (e.g. divorce resulting in joint ownership to no longer exist).

I want to go through all the owners for a single property and identify the instances when owner 1 stays the same and owner 2 is added.

Similarly, I want to go through all the owners for a single property across the years and identify the instances when owner 1 stays the same, but owner 2 is removed from ownership of the property.

Any recommendations on what would work?

# owner added or removed
add_remove <- indiv_data %>%
  group_by(property) %>%
  mutate(added_removed = ifelse(Owner2 != dplyr::lag(Owner2), 1, 0))

I'm new to R and tried the code above but realize that it's not doing what I want to achieve. Ideally the result is an addition to the dataset which identifies if there is no change (0), added (1), or removed (2).

  • 2
    You can do `as.integer(Owner2 != lag(Owner2, default = first(Owner2)))` Not clear without a small reproducible example and expected ouptut – akrun Feb 03 '20 at 20:24
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 03 '20 at 20:25

1 Answers1

0

Without sample data this is kind of a stab in the dark, but I’ll give it a try! Creating some dummy data according to your description.

library(dplyr, warn.conflicts = FALSE)
owner_df <-
  purrr::map_dfr(1:5, ~ {
    owner1_names <-
      do.call(paste0, replicate(5, sample(LETTERS, 3, TRUE), FALSE))
    owner2_names <-
      do.call(paste0, replicate(5, sample(LETTERS, 1, TRUE), FALSE))
    tibble(
      property = .,
      year = 2000:2019,
      owner1 = sort(sample(owner1_names, 20, replace = TRUE)),
      owner2 = sample(
        c("-empty-", owner2_names), 20, replace = TRUE
      )
    )
  })

As akrun commented dplyr::lag() is helpful here, to detect when there is change in the owner2 variable. Whenever owner2 and owner2_lag are not the same, a change happened. In order to only detect changes when owner 1 stays the same and owner 2 changes, we include owner 1 in the grouping and create a logical variable for each kind of change that can happen (marriage, divorce, and a third situation where a parter is swapped for another one).

owner_df %>%
  group_by(property, owner1) %>%
  mutate(
    owner2_lagged = lag(owner2, default = first(owner2)),
    change_detected = owner2 != owner2_lagged,
    marriage = change_detected & owner2_lagged == "-empty-",
    divorce = change_detected & owner2 == "-empty-",
    partner_swap = change_detected &
      owner2_lagged != "-empty-" & owner2 != "-empty-"
  )
#> # A tibble: 100 x 9
#> # Groups:   property, owner1 [15]
#>    property  year owner1 owner2 owner2_lagged change_detected marriage divorce
#>       <int> <int> <chr>  <chr>  <chr>         <lgl>           <lgl>    <lgl>  
#>  1        1  2000 OKWBQ  QSSVZ  QSSVZ         FALSE           FALSE    FALSE  
#>  2        1  2001 OKWBQ  -empt… QSSVZ         TRUE            FALSE    TRUE   
#>  3        1  2002 OKWBQ  -empt… -empty-       FALSE           FALSE    FALSE  
#>  4        1  2003 OKWBQ  QSSVZ  -empty-       TRUE            TRUE     FALSE  
#>  5        1  2004 OKWBQ  -empt… QSSVZ         TRUE            FALSE    TRUE   
#>  6        1  2005 OKWBQ  -empt… -empty-       FALSE           FALSE    FALSE  
#>  7        1  2006 OKWBQ  -empt… -empty-       FALSE           FALSE    FALSE  
#>  8        1  2007 OYQEJ  QSSVZ  QSSVZ         FALSE           FALSE    FALSE  
#>  9        1  2008 OYQEJ  QSSVZ  QSSVZ         FALSE           FALSE    FALSE  
#> 10        1  2009 OYQEJ  QSSVZ  QSSVZ         FALSE           FALSE    FALSE  
#> # … with 90 more rows, and 1 more variable: partner_swap <lgl>

Alternatively you can use dplyr::case_when() to create one variable marking the type of change.

owner_df %>%
  group_by(property, owner1) %>%
  mutate(
    owner2_lagged = lag(owner2, default = first(owner2)),
    change_detected = owner2 != owner2_lagged,
    type_of_change = case_when(
      change_detected & owner2_lagged == "-empty-" ~ "marriage",
      change_detected & owner2 == "-empty-" ~ "divorce",
      change_detected &
        owner2_lagged != "-empty-" & owner2 != "-empty-" ~ "partner_swap"
    )
  )
#> # A tibble: 100 x 7
#> # Groups:   property, owner1 [15]
#>    property  year owner1 owner2  owner2_lagged change_detected type_of_change
#>       <int> <int> <chr>  <chr>   <chr>         <lgl>           <chr>         
#>  1        1  2000 OKWBQ  QSSVZ   QSSVZ         FALSE           <NA>          
#>  2        1  2001 OKWBQ  -empty- QSSVZ         TRUE            divorce       
#>  3        1  2002 OKWBQ  -empty- -empty-       FALSE           <NA>          
#>  4        1  2003 OKWBQ  QSSVZ   -empty-       TRUE            marriage      
#>  5        1  2004 OKWBQ  -empty- QSSVZ         TRUE            divorce       
#>  6        1  2005 OKWBQ  -empty- -empty-       FALSE           <NA>          
#>  7        1  2006 OKWBQ  -empty- -empty-       FALSE           <NA>          
#>  8        1  2007 OYQEJ  QSSVZ   QSSVZ         FALSE           <NA>          
#>  9        1  2008 OYQEJ  QSSVZ   QSSVZ         FALSE           <NA>          
#> 10        1  2009 OYQEJ  QSSVZ   QSSVZ         FALSE           <NA>          
#> # … with 90 more rows
Till
  • 3,845
  • 1
  • 11
  • 18
  • This is exactly what I needed, thank you so very much! I should have added a small reproducible sample of my data, and I apologize that I didn't...but thank you so so much for taking out the time to create some and then help me with the code. Appreciate it very very much!!! – user12801590 Feb 04 '20 at 09:04