2

I have the following dataframe

print(df1)

        Date    start          end    delta d1   x_s    y_s      z_s    x_f      y_f    z_f
0   09/01/2017  09/01/2017  06/02/2017  28  28  0.989   0.945   0.626   0.191   0.932   0.280
1   10/01/2017  09/01/2017  06/02/2017  27  28  0.989   0.945   0.626   0.191   0.932   0.280
2   11/01/2017  09/01/2017  06/02/2017  26  28  0.989   0.945   0.626   0.191   0.932   0.280
3   12/01/2017  09/01/2017  06/02/2017  25  28  0.989   0.945   0.626   0.191   0.932   0.280
4   13/01/2017  09/01/2017  06/02/2017  24  28  0.989   0.945   0.626   0.191   0.932   0.280
5   14/01/2017  09/01/2017  06/02/2017  23  28  0.989   0.945   0.626   0.191   0.932   0.280
6   15/01/2017  09/01/2017  06/02/2017  22  28  0.989   0.945   0.626   0.191   0.932   0.280
7   16/01/2017  09/01/2017  06/02/2017  21  28  0.989   0.945   0.626   0.191   0.932   0.280
8   17/01/2017  09/01/2017  06/02/2017  20  28  0.989   0.945   0.626   0.191   0.932   0.280
9   18/01/2017  09/01/2017  06/02/2017  19  28  0.989   0.945   0.626   0.191   0.932   0.280

where df1['delta'] = df1['end'] - df1['Date'] and df1['d1'] = df['end']-df1['start'] I would like to create 3 new columns where it shows the interpolated values between the pairs (x_s, x_f), (y_s, y_f) , (z_s, z_f).

I have tried the following code

def mapper (name):
     return name+'_i'

ss = list(df1[['x_s', 'y_s', 'z_s']])
fs = list(df1[['x_f', 'y_f', 'z_f' ]])
df2 = pd.DataFrame

for s in ss :
    for f in fs:
         df2[s] = df1[s] + (((df1[f] - df1[s])/df1['d1'])*df1['delta'])

df_conc = pd.concat((df1, df2_new), axis=1)

however when I try to run the nested loops I get the following error:

TypeError: 'type' object does not support item assignment

I wonder what I am doing wrong. I would greatly appreciate any hint or suggestion. Thanks a lot in advance!

second attempt:

ss = ('x', 'y', 'z') 

for s in ss: 
   df1[mapper(s)] = pd.Series((df1[s+'_s'] + ((df1[s+'_f'] - df1[s+'_s'])/(df1['d1']))*df1['delta']), name=mapper(s), index=df1.index)  

but still I do not get 3 new columns which loop through the following pairs (x_s, x_f), (y_s, y_f), (z_s, z_f).

Please let me know if you spot what I am doing wrong, thanks a lot in advance!

clu
  • 117
  • 1
  • 6

3 Answers3

1

This should fix it:

for s in ss :
    for f in fs:
        df1[mapper(s)] = pd.Series(df1[s] + (((df1[f] - df1[s])/df1['d1'])*df1['delta']), name=mapper(s), index=df1.index)

I think that does what you want, lose the last concat line. Pandas wants the index passed to it when you add a new column like that see here

Something else you might need is to check the .dtypes of your columns and as needed use pd.to_datetime. This may also be helpful.

I ran the following:

df1.end = pd.to_datetime(df1.end)
df1.start = pd.to_datetime(df1.start)
df1.Date = pd.to_datetime(df1.Date)


df1.delta = df1.delta / pd.offsets.Second(1)
df1.d1 = df1.d1 / pd.offsets.Second(1)
cardamom
  • 6,873
  • 11
  • 48
  • 102
  • the code seems to work fine only for the last pair (z_s, z_f), while for the other columns it looks like the loop is only working for the ss list while for the fs list of alternatives it looks stuck on y_f. I can't understand why.. – clu Oct 29 '18 at 15:13
  • It's not giving any error messages but when I look at the results in the new columns, it seems that the formula `df1[s] + (((df1[f] - df1[s])/df1['d1'])*df1['delta']) ` correctly loops through x_s, y_s and z_f for s however it always takes the same column z_f in all of the 3 instances, instead of looping through x_f, y_f, z_f. Let me know if I have not been clear enough. Thanks! – clu Oct 29 '18 at 15:20
  • Would it do what you want if you replaced `s` on the last line with `s+f`? That way you will get 6 columns appended instead of 3. – cardamom Oct 29 '18 at 15:29
  • I believe what I would need is some sort of multiprocessing which will make the 2 loops ss and fs run concurrently – clu Oct 29 '18 at 15:37
1

I don't think you should be looping. Just let numpy do this all for you in a vectorized manner.

ss = df[['x_s', 'y_s', 'z_s']].values
fs = df[['x_f', 'y_f', 'z_f' ]].values
ss2 = ss + ((ss - fs)/df[['d1']].values)*df[['delta']].values

Note I'm sure you can get rid of some of the .values above but this should illustrate the principle

Dan
  • 45,079
  • 17
  • 88
  • 157
  • Hi Dan, I get the following error __main__:1: RuntimeWarning: divide by zero encountered in true_divide . However there are no division by zero in the dataset. something else I should check? thanks! – clu Oct 29 '18 at 15:31
0
def mapper (name):
     return name+'_i'

ss = ('x', 'y', 'z') 

for s in ss: 
   df1[mapper(s)] = pd.Series((df1[s+'_s'] + ((df1[s+'_f'] - df1[s+'_s'])/(df1['d1']))*df1['delta']), name=mapper(s), index=df1.index)
clu
  • 117
  • 1
  • 6
  • Thanks for providing code which might help solve the problem, but generally, answers are much more helpful if they include an explanation of what the code is intended to do, and why that solves the problem. – Neuron Oct 30 '18 at 11:05