1

I have the following code written for me. The problem is it works in a separate, new document, but when I try to apply it to my dataframe which has datetime as index it doesn't work. Gives errors. Any idea how to modify it so it can on a datetime.index df? Thank you in advance!

The error I get, pointing at the last line of the code below, is:

TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use `n *

import pandas as pd 
import numpy as np

Z30 = [1,1.2,0.85,0.50,-0.50,-1.20,-1.85,0.75,1.5,2]

df = pd.DataFrame(Z30)
df.columns = ['Z30']
df['Z30DELTA'] = 0

def condition(df,i):
    if  df.loc[i, 'Z30'] > 1 or  df.loc[i, 'Z30'] < -1:
        return 1
    else:
        return 0

for i in range(1, len(df)):
   
    df.loc[i, 'Z30DELTA'] = df.loc[i-1, 'Z30DELTA'] + df.loc[i, 'Z30']* condition(df,i)
fxbaba108
  • 19
  • 4
  • What is the error you'r getting, please include that in the post. – sushanth Apr 24 '21 at 02:39
  • added. Thank you for the observation @sushanth – fxbaba108 Apr 24 '21 at 02:42
  • 1
    Here is a similar post, does this answer the question https://stackoverflow.com/q/61153546/4985099 – sushanth Apr 24 '21 at 02:47
  • No, not really helpful – fxbaba108 Apr 24 '21 at 02:57
  • 1
    So... if I'm not mistaken, you've shown the version that doesn't include a datetime index, and thus it works correctly. It would be easier to reproduce your problem if you posted [code that causes the error you're getting](https://stackoverflow.com/help/minimal-reproducible-example). – CrazyChucky Apr 24 '21 at 03:07
  • Still need help here if anyone is willing. Answer below was helpful in eliminating errors but the column in question Z30DELTA is empty. See all comments below if needed. – fxbaba108 Apr 24 '21 at 23:43

2 Answers2

0

df.loc[] is label based indexing. If your index is datetime, you can't use df.loc[i ,'col] where i is integer to access value, since there is no label i in datetime index. You can use df.loc[df.index[i] ,'col] instead.

def condition(df,i):
    if  df.loc[i, 'Z30'] > 1 or  df.loc[i, 'Z30'] < -1:
        return 1
    else:
        return 0

for i in range(1, len(df)):
    df.loc[df.index[i], 'Z30DELTA'] = df.loc[df.index[i-1], 'Z30DELTA'] + df.loc[df.index[i], 'Z30']* condition(df, df.index[i])
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Thank you for your answer @ynjxsjmh I understand the logic, but its not quite working. It doesn't give me errors, but the column "Z30DELTA" is empty. Based on your explanation, wouldn't I have to also replace i within the condition statement to total_df.index[i] as well as in the for loop? – fxbaba108 Apr 24 '21 at 12:22
  • I noticed the problem lies is not getting the previous value of "Z30DELTA" with df.loc[total_df.index[i-1], 'Z30DELTA' it shows 'nan' Somehow it does get the previous value of a different column if I change it, for example: df.loc[total_df.index[i-1], 'Z30' does work – fxbaba108 Apr 24 '21 at 13:55
  • @RicardoDacosta What is `total_df`? You must ensure the index of `total_df` and `df` are the same to use syntax `df.loc[total_df.index[i-1]]`. – Ynjxsjmh Apr 24 '21 at 14:10
  • My apologies @ynjxsjmh total_df is the name of my data frame. In my document, I do have total_df.loc[total_df.index[i-1], 'Z30DELTA' but it doesn't work. It does work with a different column like 'Z30' – fxbaba108 Apr 24 '21 at 14:33
  • @RicardoDacosta It works fine with your given data. Could you check by adding `print(total_df.loc[total_df.index[i-1], 'Z30DELTA'])` under the for-loop to see if it outputs the correct result? – Ynjxsjmh Apr 24 '21 at 14:37
  • Negative @ynjxsjmh. It printout 0 and then nan until end. If I change to column to Z30 it does print out values. I thought perhaps is due to initial total_df['Z30DELTA'] = 0 but even if I change that to = 1 the first output is 1 and after is all nan – fxbaba108 Apr 24 '21 at 15:06
  • @RicardoDacosta The value of `Z30DELTA` relies on `Z30`. Could you also check the result of `df.loc[df.index[i], 'Z30']* condition(df, df.index[i])` is right? – Ynjxsjmh Apr 24 '21 at 15:10
  • I did the following: ```print(total_df.loc[total_df.index[i], 'Z30'] * condition(total_df,i)) print(condition(total_df,i))``` and I get both values 1 when condition is met and Z30 value – fxbaba108 Apr 24 '21 at 15:18
  • @RicardoDacosta I have no idea what is going on. Do you add `Z30DELTA` column with `df['Z30DELTA'] = 0` before for loop? – Ynjxsjmh Apr 24 '21 at 15:23
  • Yes, before the loop. Let me ask you. My condition statement is: ```def condition(total_df, i)``` I tried changing to: ```def condition(total_df, total_df.index[i])``` but that gives me errors. could the issue be there since i is integer and my index is datetime? – fxbaba108 Apr 24 '21 at 15:34
  • @RicardoDacosta You shouldn't define `total_df.index[i]` in function argument. I use https://paste.ubuntu.com/p/832zWJYShm/ to test your code. – Ynjxsjmh Apr 25 '21 at 01:08
  • I'm still trying to make this work. I noticed the following. Currently my df is imported from api data as dataframe. Then all the calculations take place. I tried to first import the data as excel file and then use ```total_df = pd.read_excel``` somehow if the data is imported from excel file the formula does work as it creates a new numerical index under column "A" which is not present when I use data not from excel file but from api. Since I don't need the index to be datetime, perhaps I just need to add a new numerical column as index for it to work? how would I do that? – fxbaba108 Apr 25 '21 at 16:30
  • I figured it out @ynjxsjmh just needed to add ```total_df = total_df.reset_index()``` right before the condition statement. Thank you for your help! – fxbaba108 Apr 25 '21 at 16:47
  • @RicardoDacosta Glad to see you figure your problem out. But after using `total_df.reset_index()` to reset datetime index, your code in your question should work, there is no need to use `df.index[i]`. – Ynjxsjmh Apr 26 '21 at 00:13
0

You can try this:

import pandas as pd 
import numpy as np

Z30 = [1,1.2,0.85,0.50,-0.50,-1.20,-1.85,0.75,1.5,2]

df = pd.DataFrame({'Z30': Z30})
df = df.set_index(pd.date_range(start='1/1/2018', periods=10))
df['Z30DELTA'] = 0


def condition(df,i):
    if  df.loc[i, 'Z30'] > 1 or  df.loc[i, 'Z30'] < -1:
        return 1
    else:
        return 0

for i, values in df.iterrows():
    loc = df.index.get_loc(i)
    print(loc)
    df.loc[i, 'Z30DELTA'] = df['Z30DELTA'].iloc[loc-1]  + df.loc[i, 'Z30']* condition(df,i)
print(df)

It gives the desired output.

AzulCou
  • 111
  • 5
  • Thank you @AzulCou your answer, just like previous answer, provides the correct output in new document. Unfortunately when I incorporate the formula into my own dataframe, the content of column Z30DELTA remains empty. At this point perhaps it is obvious something within my own code must be interfering with the solution. I believe, but correct me if I am wrong, unless you see my entire program I wont be able to find a solution. The program is about 200 lines. Would you be willing to take a look? – fxbaba108 Apr 25 '21 at 11:20