2

I have a dataframe called dta_filter, the main variable of interest is LogEarn for AgeInDays in [20, 25). There are 16 groups for YRSSCH ranging from 0, 1, ..., 15. An excerpt of the dataframe is shown in the pictures below.

What I aim to do is the following. I will use the Index (0, 0.1) observation in the output dataframe (picture shown below) as an example. First, I need to extract the value corresponding to 0 from the counts array (picture shown below). In this case, the value is 758. Next, I need to calculate the expression j = floor(758*0.1-1.96) where floor() denotes the floor function. Finally, I need to calculate the j-th order statistic of LogEarn for AgeInDays in [20, 25) for the group corresponding to YRSSCH=0 in the dataframe dta_filter and output this value in a new column in the dataframe output (and in the same row as (0, 0.1)) and call the column Lower Bound.

I need to do the above for all the Index values in output. So, to further illustrate, for example, for the (1, 0.5) index in output. The expression j will be j = floor(202*0.5-1.96) since 202 is the value corresponding to 1 in the counts array and it is multiplied by 0.5 because 0.5 is the second number in (1, 0.5). Then, I need to calculate the j-th order statistic of LogEarn for AgeInDays in [20, 25) for the group corresponding to YRSSCH=1 (note, the group is dictated by the first number in (1, 0.5)) in the dataframe dta_filter and then output this value in the Lower Bound column in the dataframe output and in the same row as (1, 0.5).

My code below only does this one at a time. For example, for the (0, 0.1) index:

k=math.floor(counts[0]*0.1-l)-1

x=dta_filter[(dta.YRSSCH==0)][['LogEarn for AgeInDays in [20, 25)']]

x=x['LogEarn for AgeInDays in [20, 25)'].tolist()

lower_bound=np.partition(np.asarray(x), k)[k]

Note: In the above code, I have a -1 in the expression for k since the code np.partition(np.asarray(x), k)[k] indexes the first order statistic from 0 rather than from 1.

How can I do the above for all indices in output? I am only a beginner in python, perhaps a loop might work?

This is my first post on Stack Overflow, if my question was not clear enough, please let me know and I will edit to further explain.

enter image description here enter image description here enter image description here enter image description here

TeTs
  • 121
  • 2
  • It isn't entirely clear to me what you're trying to do, but I think you should look into pandas `apply`. See [this question](https://stackoverflow.com/questions/16353729/pandas-how-to-use-apply-function-to-multiple-columns). Take note that depending if you want to apply across rows or columns, you'll need to set `axis` accordingly. – Ryan Dec 04 '17 at 19:42

0 Answers0