-1

I have this dataset Overview of my dataset. I need to take the SDI variable and make it a constant, so that the value from the lowest year is displayed on every entry for every country.

ex.

    **Current:**                                **What I need:**
    |Angola, 2005 -> SDI = 2.0            | Angola, 2005 -> SDI = 2.0
    |Angola, 2013 -> SDI = 3.0            | Angola, 2013 -> SDI = 2.0         
    |Angola, 2017 -> SDI = 2.5            | Angola, 2017 -> SDI = 2.0
    |Argentina, 2005 -> SDI = 8.0         | Argentina, 2005 -> SDI = 8.0
    |Argentina, 2013 -> SDI = 7.0         | Argentina, 2013 -> SDI = 8.0
    |Argentina, 2017 -> SDI = 7.5         | Argentina, 2017 -> SDI = 8.0

So whatever the lowest year for that country is that is the SDI value that needs to become constant and be displayed regardless of the years to come. (if that makes sense)

My second dilemma is that i need to take the ODA variable and divide it by the population for each country. I'm wondering if there is a faster way to do this then by doing it individually for each country.

So for Afghanistan 2013 the ODA is 5239070000 / AFG_population and Angola 2005 the ODA is 1667990000 / AGO_population

and then continuing this for all the countries, as i need to derive their per capita result.

Country = c("Afghanistan", "Algeria","Angola","Argentina","Armenia",
        "Bangladesh","Belarus","Benin","Bhutan","Brazil")
            
year = c("2013", "2017", "2005", "2009", "2005", 
         "2009", "2005", "2005", "2009", "2005" )

ODA = c("5239070000", "2186920000", "1667990000","2102860000",
        "215700000","1943200000","141020000",
        "379850000","163570000","21630120000")

df = data.frame(Country, year, ODA) 

(Short version of my dataset)

I have tried creating subsets and looking at different functions. But I'm completely stumped and not even sure how or where I should look anymore.. :(

If anyone can help me with this then i would be forever grateful!

Lex
  • 3
  • 2
  • Please don't spam tags. Stack's tag-recommendation system is imperfect, please look at the tags suggested and ensure they are correct. In this case, your question appears to be unrelated to the RStudio IDE (other than possibly running the code inside of it), but the [tag:rstudio] tag is only for questions specific to the IDE itself, not the R programming language. Mouse over each tag and they often have a brief description including question-hints and limitations. – r2evans Apr 25 '23 at 18:15
  • Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly into a [code block]. In this case, I'm not going to spend time scraping the data needed to try the second half of your question. – r2evans Apr 25 '23 at 18:17

1 Answers1

0

Your first dilemma is similar to summarize by group (and How to sum a variable by group). I'll address that one.

Here are a few ways:

dplyr

library(dplyr)
quux %>%
  mutate(SDI = SDI[which.min(Year)], .by = Country)
#     Country Year SDI
# 1    Angola 2005   2
# 2    Angola 2013   2
# 3    Angola 2017   2
# 4 Argentina 2005   8
# 5 Argentina 2013   8
# 6 Argentina 2017   8

With dplyr before 1.1.0, use

quux %>%
  group_by(Country) %>%
  mutate(SDI = SDI[which.min(Year)]) %>%
  ungroup()

data.table

library(data.table)
as.data.table(quux)[, SDI := SDI[which.min(Year)], by = Country]

base R

quux$SDI2 <- with(quux,
  ave(replace(SDI, Year > ave(Year, Country, FUN = min), NA),
      Country, FUN = function(z) na.omit(z)[1])
)
quux
#     Country Year SDI SDI2
# 1    Angola 2005 2.0    2
# 2    Angola 2013 3.0    2
# 3    Angola 2017 2.5    2
# 4 Argentina 2005 8.0    8
# 5 Argentina 2013 7.0    8
# 6 Argentina 2017 7.5    8

Data

quux <- structure(list(Country = c("Angola", "Angola", "Angola", "Argentina", "Argentina", "Argentina"), Year = c(2005L, 2013L, 2017L, 2005L, 2013L, 2017L), SDI = c(2, 3, 2.5, 8, 7, 7.5)), class = "data.frame", row.names = c(NA, -6L))
r2evans
  • 141,215
  • 6
  • 77
  • 149