-2

I am looking for a nice and easy way to decrease the values of column 4 based on the evolution of column 3 for every country in column 1 and every year in column 2 respectively. This should be done in R with a data.table object.

Data:

country1,2020,100,1
country1,2025,120,1
country1,2030,140,1
country2,2020,100,1
country2,2025,150,1
country2,2030,180,1

What I want to have is something like this:

Data:

country1,2020,100,1
country1,2025,120,0.8
country1,2030,140,0.6
country2,2020,100,1
country2,2025,150,0.5
country2,2030,180,0.2
  • 1
    what do you mean specifically by "decrease values"? Could you show us the expected output and reformat your data using `dput` so that it is easy to load? See [Minimal Working Example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Vincent Nov 25 '20 at 14:58
  • I included my expected output. – Sebastian.DataAnalyst Nov 25 '20 at 17:05
  • but you did not explain exactly what the relationship between the two columns is. For instance, why are we going from .5 to .42 in the last row? – Vincent Nov 25 '20 at 17:34
  • This is just an example. I do not know the exact relationship yet. I am just looking for a nice and convenient way to decrease this value based on the other value. The exact parameters are irrevant in this case. So ideally there is a solution with lets say linear decay and one with exponential decay and one with some random decay. This would be perfect – Sebastian.DataAnalyst Nov 25 '20 at 19:40
  • It is not irrelevant. The code we suggest will implement one form of decay, so you need to specify a "complete" problem if you want a "complete" answer. – Vincent Nov 25 '20 at 20:46
  • Ok I did changed the values. Getting these values would ne great. Sorry for the imprecise question. – Sebastian.DataAnalyst Nov 25 '20 at 22:05

1 Answers1

2

Here is an approach with dplyr and tidyr:

library(tidyverse)
data %>%
  separate(1, sep = ",", into = c("country","year","var1","var2")) %>%
  mutate(across(year:var2, as.numeric)) %>%
  group_by(country) %>%
  mutate(var2 = var2 * ((2*min(var1))-var1)/100)
## A tibble: 6 x 4
## Groups:   country [2]
#  country   year  var1  var2
#  <chr>    <dbl> <dbl> <dbl>
#1 country1  2020   100   1  
#2 country1  2025   120   0.8
#3 country1  2030   140   0.6
#4 country2  2020   100   1  
#5 country2  2025   150   0.5
#6 country2  2030   180   0.2

Or with data.table:

library(data.table)
setDT(data)
data[, c("country","year","var1","var2") := tstrsplit(V1,",")]
data[,V1 := NULL]
data[,c("year","var1","var2") := lapply(.SD,as.numeric),.SDcol = c("year","var1","var2")]
data[,var2 := .(var2 * (2*min(var1)-var1)/100), by = "country"]
data
#    country year var1 var2
#1: country1 2020  100  1.0
#2: country1 2025  120  0.8
#3: country1 2030  140  0.6
#4: country2 2020  100  1.0
#5: country2 2025  150  0.5
#6: country2 2030  180  0.2
data <- structure(list(V1 = c("country1,2020,100,1", "country1,2025,120,1", 
"country1,2030,140,1", "country2,2020,100,1", "country2,2025,150,1", 
"country2,2030,180,1")), class = "data.frame", row.names = c(NA, 
-6L))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57