Find target values in pandas dataframe

Question

I have a multilevel dataframe df. As columns, I have different "objects" I analyze. As rows index , I have a Case ID lc, and time t.

I need to find, for each case lc, the time t (ideally interpolated, but closest value is fine enough) at which each object reached a target value.

This target value is a function of the given object at time t==0.

import pandas as pd
print(pd.__version__)

0.16.2

Dummy data set example:

data = {1: {(1014, 0.0): 20.25,
     (1014, 0.0991): 19.08,
     (1014, 0.1991): 18.43,
     (1014, 0.2991): 19.03,
     (1014, 0.3991): 18.71,
     (1015, 0.0): 20.22,
     (1015, 0.0991): 19.3,
     (1015, 0.1991): 18.68,
     (1015, 0.2991): 18.22,
     (1015, 0.3991): 17.84,
     (1016, 0.0): 21.75,
     (1016, 0.0991): 19.97,
     (1016, 0.1991): 19.65,
     (1016, 0.2991): 19.29,
     (1016, 0.3991): 18.94
    },
 2: {(1014, 0.0): 29.11,
     (1014, 0.0991): 28.68,
     (1014, 0.1991): 28.27,
     (1014, 0.2991): 27.46,
     (1014, 0.3991): 26.96,
     (1015, 0.0): 29.22,
     (1015, 0.0991): 28.64,
     (1015, 0.1991): 28.18,
     (1015, 0.2991): 27.74,
     (1015, 0.3991): 27.25,
     (1016, 0.0): 29.17,
     (1016, 0.0991): 28.68,
     (1016, 0.1991): 28.17,
     (1016, 0.2991): 27.68,
     (1016, 0.3991): 27.18
    },
 3: {(1014, 0.0): 22.01,
     (1014, 0.0991): 21.5,
     (1014, 0.1991): 21.18,
     (1014, 0.2991): 20.58,
     (1014, 0.3991): 20.21,
     (1015, 0.0): 21.81,
     (1015, 0.0991): 21.46,
     (1015, 0.1991): 21.11,
     (1015, 0.2991): 20.78,
     (1015, 0.3991): 20.42,
     (1016, 0.0): 21.82,
     (1016, 0.0991): 21.49,
     (1016, 0.1991): 21.11,
     (1016, 0.2991): 20.75,
     (1016, 0.3991): 20.37
    }}

df = pd.DataFrame(data).sort()
df.index.names=['case', 't']

Dataframe looks thus like:

                 1      2      3
case t                          
1014 0.0000  20.25  29.11  22.01
     0.0991  19.08  28.68  21.50
     0.1991  18.43  28.27  21.18
     0.2991  19.03  27.46  20.58
     0.3991  18.71  26.96  20.21
1015 0.0000  20.22  29.22  21.81
     0.0991  19.30  28.64  21.46
     0.1991  18.68  28.18  21.11
     0.2991  18.22  27.74  20.78
     0.3991  17.84  27.25  20.42
1016 0.0000  21.75  29.17  21.82
     0.0991  19.97  28.68  21.49
     0.1991  19.65  28.17  21.11
     0.2991  19.29  27.68  20.75
     0.3991  18.94  27.18  20.37

Target values are a function of the values at time t==0. typically, this would be k=0.5 for half-time period. For the current sample,we will take k=0.926

Since values are sorted, it is ok to take the first lines for each case.

targets = df.groupby(level='case').first() * 0.926
print(targets)

             1         2         3
case                              
1014  18.75150  26.95586  20.38126
1015  18.72372  27.05772  20.19606
1016  20.14050  27.01142  20.20532

Now, How could I simply build the following dataframe, which shows time t at wich each object reach target value calculated above?

             1         2         3
case                              
1014    0.3991    0.3991    0.2991
1015    0.1991    0.3991    0.3991
1016    0.0991    0.3991    0.3991

Looks like the dataset is sorted, it is always this case? – CT Zhu Aug 16 '15 at 21:16 — CT Zhu, Aug 16 '15 at 21:16
Yes, dataset can be assumed to be already sorted – Nic Aug 17 '15 at 06:44 — Nic, Aug 17 '15 at 06:44

score 1 · Accepted Answer · answered Aug 16 '15 at 21:10

These are somewhat of a hack, let's see if there are better solutions:

In [36]:
targets['t']=0

In [37]:
df2 = df.reset_index().set_index('case') - targets

In [38]:
df3 = df2.groupby(df2.index).transform(lambda x: x.abs()==np.min(x.abs()))

In [39]:
df4 = pd.DataFrame({'1': df2.t[df3[1]],
                    '2': df2.t[df3[2]],
                    '3': df2.t[df3[3]]})

print df4

           1       2       3
case                        
1014  0.3991  0.3991  0.3991
1015  0.1991  0.3991  0.3991
1016  0.0991  0.3991  0.3991

Nice, thanks CT Zhu. For remaining generalist, I would simply change your last step by: _DHT = {c: df2.t[df3[c]] for c in df3.columns}; DHT = pd.DataFrame(_DHT).drop(['t'], axis=1) — Nic, Aug 17 '15 at 07:01

Find target values in pandas dataframe

1 Answers1

Linked