2

Currently, I am trying to cut the dataset into three parts: developed, developing and under-developed. The cut criteria is quantiles. That is, developed would be those above 75% quantiles, developing would be between 50%-75% and under-developed would be below 50%. However, quantiles are different by years.

data = data.frame("country" = c("U.S.A","U.S.A","Jamaica","Jamaica","Congo","Congo"), 
"year" = c(2000,2001,2000,2001,2000,2001), 
"gdp_per_capita" = c(30000,40000,100,200,50,60))
quantiles = do.call("data.frame", 
tapply(data$gdp_per_capita, data$year, quantile))

What I did was to calculate the quantiles by year and I got a data frame with just that information. Now, I am trying to use this information to apply above criteria for each year.

Example 2000 = (50% = 3000, 75% = 15999) 2001 = (50% = 5000, 75% = 18000) cut points changes

Possible results

year country  gdp_per_capita    status
2000    U.S.      1800000      "developed"
2000    France    200000       "developed"
....more than 500+ obs.
2000   Kenya       300         "under-developed"
2000   Malaysia    1500         "developing"
2001   Malaysia    3000         "developing"
2001   Kenya       500         "under-developed"
2001   Spain       30000       "developed"
2000   India       300       "under-developed"
2001   India       5100        "developing"

What will be the most efficient way to resolving this issue? I tried using ifelse and doing that one by one. This seems like it is too much work and I felt like there was no reason to use computer if I am going to iterate them one by one.

Jerald.IFF
  • 63
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. A `glimpse` does not count as reproducible because we can't copy/paste that into R. – MrFlick Apr 29 '19 at 21:40
  • This might be a duplicate of: [How to quickly form groups (quartiles, deciles, etc) by ordering column(s) in a data frame](https://stackoverflow.com/questions/4126326/how-to-quickly-form-groups-quartiles-deciles-etc-by-ordering-columns-in-a) – divibisan Apr 29 '19 at 21:42
  • It's kind of different, because the quantiles here are different by year. I already looked at that post, which seemed to take quantiles for whole dataset regardless of the type. – Jerald.IFF Apr 29 '19 at 21:48
  • 2
    Most likely you just need a `group_by()` and `mutate()` but the sample data you provided is just one year and the same value repeated over and over so that doesn't make it easy to test with. We don't need your real data, just something small and representative of the problem. – MrFlick Apr 29 '19 at 21:52

1 Answers1

1

Instead of data.frame, consider rbind in do.call to create quantile percents as columns, then merge to original dataset by year. Finally, calculate status with a nested ifelse conditional logic.

### QUANTILES
quantiles_matrix <- do.call("rbind", tapply(data$gdp_per_capita, data$year, quantile))

quantiles_df <- transform(data.frame(quantiles_matrix), 
                          year = row.names(quantiles_matrix))

### MERGE
mdf <- merge(data, quantiles_df, by="year")

### STATUS COLUMN ASSIGNMENT
final_df <- transform(mdf,
  status = ifelse(gdp_per_capita > X75., "developed",
                   ifelse(gdp_per_capita >= X50. & gdp_per_capita <= X75., "developing",
                          ifelse(gdp_per_capita < X50., "under-developed", NA)
                   )
            )
)

Rextester demo

Parfait
  • 104,375
  • 17
  • 94
  • 125