For loop in R to rewrite initial datasets

Question

UPD: HERE what I need:

Example of some datasets are here (I have 8 of them): https://drive.google.com/drive/folders/1gBV2ZkywW6JqDjRICafCwtYhh2DHWaUq?usp=sharing

What I need is:

For example, in those datasets there is lev variable. Let's say this is a snapshot of the data in these datasets:

ID   Year  lev   
1    2011  0.19 
1    2012  0.19 
1    2013  0.21 
1    2014  0.18 
2    2013  0.39 
2    2014  0.15 
2    2015  0.47 
2    2016  0.35 
3    2013  0.30 
3    2015  0.1  
3    2017  0.13 
3    2018  0.78 
4    2011  0.13 
4    2012  0.35

Now, I need to create in each of my datasets EE_AB, EE_C, EE_H, etc., create variables ff1 and ff2 which are constructed for year ID, in each year respectively to the median of the whole IDs in that particular year.

Let's take an example of the year 2011. The median of the variable lev in this dataset in 2011 is (0.19+0.13)/2 = 0.16, so ff1 for ID 1 in 2011 should be 0.19/0.16 = 1.1875, and for ID 4 in 2011 ff1 = 0.13/0.16 = 0.8125.

Now let's take the example of 2013. The median lev is 0.3. so ff1 for ID 1, 2, 3 will be 0.7, 1.3, 1 respectively.

The desired output should be the ff1 variable in each dataset (e.g., EE_AB, EE_C, EE_H) as:

ID   Year  lev    ff1    
1    2011  0.19  1.1875
1    2012  0.19  0.7037
1    2013  0.21  0.7
1    2014  0.18  1.0909
2    2013  0.39  1.3
2    2014  0.15  0.9091
2    2015  0.47  1.6491
2    2016  0.35  1
3    2013  0.30  1
3    2015  0.1   0.3509
3    2017  0.13  1
3    2018  0.78  1
4    2011  0.13  0.8125
4    2012  0.35  1.2963

And this should be in the same way for other dataframes.

r2evans · Accepted Answer · 2021-02-09T13:39:02.117

Here's a tidyverse method:

library(dplyr)
# library(purrr)
data_frameAB %>%
  group_by(Year) %>%
  mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
  ungroup()
# # A tibble: 14 x 5
#       ID  Year     c     d   ff1
#    <dbl> <dbl> <dbl> <dbl> <dbl>
#  1     1  2011    10    12  2.2 
#  2     1  2012    11    13  2.18
#  3     1  2013    12    14  2.17
#  4     1  2014    13    15  2.15
#  5     1  2015    14    16  2.14
#  6     1  2016    15    34  3.27
#  7     1  2017    16    25  2.56
#  8     1  2018    17    26  2.53
#  9     1  2019    18    56  4.11
# 10    15  2015    23    38  2.65
# 11    15  2016    26    25  1.96
# 12    15  2017    30    38  2.27
# 13    45  2011   100   250  3.5 
# 14    45  2012   200   111  1.56

Without purrr, that inner expression would be

  mutate(ff1 = (c+d) / mapply(median, c, d))

albeit with type-safeness.

Since you have multiple frames in your data management, I have two suggestions:

Combine them into a list. This recommendation stems off the assumption that whatever you're doing to one frame you are likely to do all three. In that case, you can use lapply or purrr::map on the list of frames, doing all frames in one step. See https://stackoverflow.com/a/24376207/3358227.

list_of_frames <- list(AB=data_frameAB, C=data_frameC, F=data_frameF)
list_of_frames2 <- purrr::map(
  list_of_frames,
  ~ .x %>%
    group_by(Year) %>%
    mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>% ungroup()
)

Again, without purrr, that would be

list_of_frames2 <- lapply(
  list_of_frames,
  function(.x) group_by(.x, Year) %>%
    mutate(ff1 = (c+d) / mapply(median c, d)) %>%
    ungroup()
)

Combine them into one frame, preserving the original data. Starting with list_of_frames,

bind_rows(list_of_frames, .id = "Frame") %>%
  group_by(Frame, Year) %>%
  mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
  ungroup()
# # A tibble: 42 x 6
#    Frame    ID  Year     c     d   ff1
#    <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#  1 AB        1  2011    10    12  2.2 
#  2 AB        1  2012    11    13  2.18
#  3 AB        1  2013    12    14  2.17
#  4 AB        1  2014    13    15  2.15
#  5 AB        1  2015    14    16  2.14
#  6 AB        1  2016    15    34  3.27
#  7 AB        1  2017    16    25  2.56
#  8 AB        1  2018    17    26  2.53
#  9 AB        1  2019    18    56  4.11
# 10 AB       15  2015    23    38  2.65
# # ... with 32 more rows

That's the tidyverse dialect of R. The `~` tilde is like an anonymous function, so `~ .x` is the same as `function(.x) .x`. The `%>%` is a `magrittr` "pipe" used to syntactically change how code *looks*, more of a pipe of data than parenthetic nesting that is normal for R. The pipe is reexported by the `dplyr` package, and much of the tidyverse flow uses it (often, in practice, in excess, imo). — r2evans, Feb 09 '21 at 13:36
I believe it is not what I wanted. I mean, let's look at the year 2015, for example. There are 2 IDs 1 and 15. So, what I need is ff1 for ID 1 in the year 2015 should be (14+16) / median (14+16 and 23+38), essentially 30/45.5 = 0.65, and not 2.14 — Wadim iLchuk, Feb 09 '21 at 13:55
what if I try to run the following code EE_list <- purrr::map( `EE_list, ~ .x %>% group_by(Year) %>% mutate(ff1 = lev / purrr::map2_dbl(lev, median), ff2 = cash / purrr::map2_dbl(cash, median)) %>% ungroup() )`, where lev is c+d. I get the following error: `Error: Problem with `mutate()` input `ff1`. x argument ".f" is missing, with no default i Input `ff1` is `lev/purrr::map2_dbl(lev, median)`. i The error occurred in group 1: Year = 2011.` — Wadim iLchuk, Feb 09 '21 at 13:56
I hope you'll understand that when your question said `median(c+d)`, I interpreted that to mean `median(c+d)` and not that. This is a great opportunity for you to edit your question and paste your expected output given that sample data; doing so will remove any ambiguity. — r2evans, Feb 09 '21 at 13:58
If you are going to use the suggested code in different ways, then you need to read the help pages. In this case, had you read [`?map2_dbl`](https://purrr.tidyverse.org/reference/map2.html), you would learn that `map2*` and `pmap*` functions *"iterate over multiple arguments simultaneously"*, and therefore require different arguments. — r2evans, Feb 09 '21 at 14:00
I have updated the question, hope this will help to mitigate the ambiguity — Wadim iLchuk, Feb 09 '21 at 14:32

For loop in R to rewrite initial datasets

1 Answers1