I have a data frame with about 45k points with 3 columns - weight, persons and population. Population is weight*persons. I want to be able to split the data frame into ntiles(deciles, centiles etc) based on need. The data frame has to be split in a way that there are same number of population points in each ntile.
Which means, the data frame needs to be split at value = sum(population)/ntile. So for example if ntile = 10, then, sum(population)/10 = a. Next I need to add up row values in population column till sum = a, split at that point and continue this until I have run through all the 45K points. A sample of data is below.
weight persons population
1 3687.926 9 33191.337
2 3687.926 16 59006.8217
3 3687.926 7 25815.4847
4 4420.088 5 22100.447
5 4420.088 7 30940.6167
6 4420.088 6 26520.5287
7 3687.926 15 55318.8927
8 3687.926 9 33191.3357
9 3687.926 6 22127.5577
10 4452.829 8 35622.6367
11 4452.829 3 13358.4887
12 4452.829 4 17811.3187
I have been trying to use loops. I am stuck on splitting the data frame into the n splits needed. I an new to R. So any help is appreciated.
x= df$population
break_point = sum(x)/10
ntile_points = 0
for(i in 1:length(x))
{
while(ntile_points != break_point)
{
ntile_points = ntile_points+x[i]
}
}