Currently, I am trying to cut the dataset into three parts: developed, developing and under-developed. The cut criteria is quantiles. That is, developed would be those above 75% quantiles, developing would be between 50%-75% and under-developed would be below 50%. However, quantiles are different by years.
data = data.frame("country" = c("U.S.A","U.S.A","Jamaica","Jamaica","Congo","Congo"),
"year" = c(2000,2001,2000,2001,2000,2001),
"gdp_per_capita" = c(30000,40000,100,200,50,60))
quantiles = do.call("data.frame",
tapply(data$gdp_per_capita, data$year, quantile))
What I did was to calculate the quantiles by year and I got a data frame with just that information. Now, I am trying to use this information to apply above criteria for each year.
Example 2000 = (50% = 3000, 75% = 15999) 2001 = (50% = 5000, 75% = 18000) cut points changes
Possible results
year country gdp_per_capita status
2000 U.S. 1800000 "developed"
2000 France 200000 "developed"
....more than 500+ obs.
2000 Kenya 300 "under-developed"
2000 Malaysia 1500 "developing"
2001 Malaysia 3000 "developing"
2001 Kenya 500 "under-developed"
2001 Spain 30000 "developed"
2000 India 300 "under-developed"
2001 India 5100 "developing"
What will be the most efficient way to resolving this issue? I tried using ifelse and doing that one by one. This seems like it is too much work and I felt like there was no reason to use computer if I am going to iterate them one by one.