Separating list-col values into singular values relative to a condition

Question

Simplified explanation

Convert from long to wide, whilst filling missing values as 17 for 2019 and 16 for 2010, whilst those values in 2010 that match with 2019, then minus their pland value (i.e. 2019-2010). If there are no values for 2019 and its filled in with 17, give that pland value a negative value. Whilst, if 16 is filled for the missing value in 2010 leave the pland value as is, positive.

This should look like table 2.

Table 1: Example of the dataframe in long format

# A tibble: 10 x 4
   year  locality_id landcover  pland
   <chr> <chr>           <int>  <dbl>
 1 2010  L452817             8 0.0968
 2 2010  L452817             9 0.0323
 3 2010  L452817            12 0.613 
 4 2010  L452817            13 0.194 
 5 2010  L452817            14 0.0645
 6 2019  L452817             8 0.0645
 7 2019  L452817             9 0.0645
 8 2019  L452817            12 0.516 
 9 2019  L452817            13 0.194 
10 2019  L452817            14 0.161

Table 2: Expected format of table 2

   locality_id X2010 X2019       pland
1      L452817     8     8 -0.03225806
2      L452817     9     9  0.03225807
3      L452817    12    12 -0.09677420
4      L452817    13    13  0.00000000
5      L452817    14    14  0.09677419
6      L910180     0    17 -0.43750000
7      L910180     8    17 -0.34375000
8      L910180     9    17 -0.03125000
9      L910180    10    17 -0.03125000
10     L910180    11    17 -0.09375000
11     L910180    13    17 -0.06250000

What I have tried:

#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]

#set from long to wide providing the pland differences from t as another column
y %>%
    group_by(year) %>%
    mutate(row = row_number()) %>%
    tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
    select(-row) %>% mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') 
        replace_na(.x, 17) else replace_na(.x, 16))) %>% mutate(t[t$year %in% 2019,]$pland - t[t$year %in% 2010,]$pland)

# A tibble: 11 x 4
   locality_id `2010` `2019` `t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland`
   <chr>        <dbl>  <dbl>                                                       <dbl>
 1 L452817          8      8                                                    -0.0323 
 2 L452817          9      9                                                     0.0323 
 3 L452817         12     12                                                    -0.0968 
 4 L452817         13     13                                                     0      
 5 L452817         14     14                                                     0.0968 
 6 L910180          0     17                                                    -0.373  
 7 L910180          8     17                                                    -0.279  
 8 L910180          9     17                                                     0.485  
 9 L910180         10     17                                                     0.162  
10 L910180         11     17                                                     0.0675 
11 L910180         13     17                                                     0.00202

The problem with my code above, is that it always calculate the differences, it shouldn't calculate differences for those values that have been introduced because of missing values, so when there is either 16 or 17 on either side.

Resources I have tried: One, and two.

reproducible code:

structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L, 
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817", 
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180", 
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L, 
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129, 
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375, 
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581, 
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))

Sinh Nguyen · Accepted Answer · 2021-01-24T23:26:36.040

Instead of using dummy variables to identify missing I use a different approach with complete with df is your original data structure.

df %>%
  # fill in the data with missing year so we can compute while data in long format
  complete(year, nesting(locality_id, landcover), fill = list(pland = 0)) %>%
  arrange(desc(year)) %>%
  group_by(locality_id, landcover) %>%
  summarize(
    X2010 = if_else(pland[year == 2010] == 0 , 16L, first(landcover)),
    X2019 = if_else(pland[year == 2019] == 0 , 17L, first(landcover)),
    pland  = pland[year == 2019] - pland[year == 2010]) %>%
  arrange(locality_id, landcover)

Here is the output

   locality_id landcover X2010 X2019   pland
   <chr>           <int> <int> <int>   <dbl>
 1 L452817             8     8     8 -0.0323
 2 L452817             9     9     9  0.0323
 3 L452817            12    12    12 -0.0968
 4 L452817            13    13    13  0     
 5 L452817            14    14    14  0.0968
 6 L910180             0     0    17 -0.438 
 7 L910180             8     8    17 -0.344 
 8 L910180             9     9    17 -0.0312
 9 L910180            10    10    17 -0.0312
10 L910180            11    11    17 -0.0938
11 L910180            13    13    17 -0.0625

I like your approach! Although I require those values that are missing to be filled with `16` or `17`, because they're be converted into `characters` for naming. So, managing a way to keep both years `2010` and `2019` with landcover as values, is great. Again, I like your approach, I tested it to see if it works with both positive and negative values, and it does! just need to figure how to introduce `2010` and `2010` with `NAs` replaced by `16` and `17`. — Lime, Jan 24 '21 at 22:49
I was just trying to figure out how to convert those zeros into `16` and `17`, you beat me to it. Great answer! — Lime, Jan 24 '21 at 23:31

score 0 · Answer 2 · answered Jan 24 '21 at 22:08

Managed to figure it out, although better suggestions are welcome, especially without warnings!

#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]

#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>% 
mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') replace_na(.x, 17) else replace_na(.x, 16))) %>% 
mutate(ifelse(`2019` == `2010`, t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland, -t$pland))

Warning messages: 1: Problem with mutate() input ..1.
i longer object length is not a multiple of shorter object length
i Input ..1 is ifelse(...).
2: In t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland :
longer object length is not a multiple of shorter object length

# A tibble: 11 x 4
   locality_id `2010` `2019` `ifelse(...)`
   <chr>        <dbl>  <dbl>         <dbl>
 1 L452817          8      8       -0.0323
 2 L452817          9      9        0.0323
 3 L452817         12     12       -0.0968
 4 L452817         13     13        0     
 5 L452817         14     14        0.0968
 6 L910180          0     17       -0.438 
 7 L910180          8     17       -0.344 
 8 L910180          9     17       -0.0312
 9 L910180         10     17       -0.0312
10 L910180         11     17       -0.0938
11 L910180         13     17       -0.0625

Break down:

Using the code suggestion from here

This creates an id column relative to the grouped column, and repeats for each unique value in group_by()

Then using the next code, from here

This replaces the NAs from 2010 with 16 and those from 2019 with 17

And finally, the ifelse() statement, I was hanging by a thread thinking it would work, and it did!

It selects those values of landcover that are equal to 2019 and 2010 respectively, then it takes their differences by taking the minus of those values. And finally, those values that are not the same, are filled in with the remaining pland values, whilst taking the negative of it.

However I haven't figured out how to tackle those values when 16 appears in 2010, so 2019 pland values remain positive, considering that its always set to negative!

Separating list-col values into singular values relative to a condition

2 Answers2