I have a dataframe called dta_filter
, the main variable of interest is LogEarn for AgeInDays in [20, 25)
. There are 16 groups for YRSSCH
ranging from 0, 1, ..., 15. An excerpt of the dataframe is shown in the pictures below.
What I aim to do is the following. I will use the Index (0, 0.1) observation in the output
dataframe (picture shown below) as an example. First, I need to extract the value corresponding to 0 from the counts
array (picture shown below). In this case, the value is 758. Next, I need to calculate the expression j = floor(758*0.1-1.96) where floor() denotes the floor function. Finally, I need to calculate the j-th order statistic of LogEarn for AgeInDays in [20, 25)
for the group corresponding to YRSSCH=0
in the dataframe dta_filter
and output this value in a new column in the dataframe output
(and in the same row as (0, 0.1)) and call the column Lower Bound
.
I need to do the above for all the Index values in output
. So, to further illustrate, for example, for the (1, 0.5) index in output
. The expression j will be j = floor(202*0.5-1.96) since 202 is the value corresponding to 1 in the counts
array and it is multiplied by 0.5 because 0.5 is the second number in (1, 0.5). Then, I need to calculate the j-th order statistic of LogEarn for AgeInDays in [20, 25)
for the group corresponding to YRSSCH=1
(note, the group is dictated by the first number in (1, 0.5)) in the dataframe dta_filter
and then output this value in the Lower Bound
column in the dataframe output
and in the same row as (1, 0.5).
My code below only does this one at a time. For example, for the (0, 0.1) index:
k=math.floor(counts[0]*0.1-l)-1
x=dta_filter[(dta.YRSSCH==0)][['LogEarn for AgeInDays in [20, 25)']]
x=x['LogEarn for AgeInDays in [20, 25)'].tolist()
lower_bound=np.partition(np.asarray(x), k)[k]
Note: In the above code, I have a -1 in the expression for k since the code np.partition(np.asarray(x), k)[k]
indexes the first order statistic from 0 rather than from 1.
How can I do the above for all indices in output
? I am only a beginner in python, perhaps a loop might work?
This is my first post on Stack Overflow, if my question was not clear enough, please let me know and I will edit to further explain.