1

I need to calculate the difference between two dates but I need to do it for each pair of rows, then save it in an excel file,for example :

df:

  B
0 2018-02-11 12:14:25                #1st row
1 2018-02-11 12:15:30                #2nd row
2 2018-02-11 12:15:54                #3rd row
3 2018-02-11 12:16:11                #4th row

My program needs to subtract the second row from the first row and the forth row from the third row.

this is my code so far:

import pandas as pd

df = pd.read_excel('test.xlsx',header=0, index= False)
sub ='chan_avail'
df["Indexes"]= df["A"].str.find(sub) 
df["B"]=df['Time'].where(df['Indexes'] == 0)
df1 = df.dropna(subset=['B'])
#print(df1)
df2 = df1.reset_index(drop=True, inplace=True)
df1['B'] = pd.to_datetime(df1['B'])

#print(df1)
xx=len(df1.index)
for i in range(xx):
    if i % 2 == 0:
        print('!!') #test
        df1['diffB'] = df1['B'] - df1['B'].shift(-1)
print(df1)
df1.to_excel('output.xlsx', 'Sheet1', index=True)  

what I intented to do was calculating the difference between rows if the index is pair but it did not work. Also when I tried to save the difference in time in excel I got 0 but in the consol of python I got the correct value, but if i add df1['diffB'] = pd.to_datetime(df1['diffB']) I get an incorrect result.

Thank you for your help.

newbie
  • 646
  • 8
  • 27

1 Answers1

1

You can slicing odd and even values and one Series convert to numpy array for avoid alignment - it means for correct subtract is necessary same index values in both or one ot them convert to array:

df['B'] = pd.to_datetime(df['B'])

df['diffB'] = df['B'].iloc[1::2] - df['B'].iloc[::2].to_numpy()
print (df)
                    B    diffB
0 2018-02-11 12:14:25      NaT
1 2018-02-11 12:15:30 00:01:05
2 2018-02-11 12:15:54      NaT
3 2018-02-11 12:16:11 00:00:17

Or:

df['diffB'] = df['B'].iloc[1::2].to_numpy() - df['B'].iloc[::2]
print (df)
                    B    diffB
0 2018-02-11 12:14:25 00:01:05
1 2018-02-11 12:15:30      NaT
2 2018-02-11 12:15:54 00:00:17
3 2018-02-11 12:16:11      NaT

Solution working if also odd number of rows:

df['B'] = pd.to_datetime(df['B'])
df['diffB'] = df.groupby(np.arange(len(df)) // 2)['B'].shift(-1) - df['B']
print (df)
                    B    diffB
0 2018-02-11 12:14:25 00:01:05
1 2018-02-11 12:15:30      NaT
2 2018-02-11 12:15:54 00:00:17
3 2018-02-11 12:16:11      NaT
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you so much for your help, but if you can please explain to me what does `iloc[1::2]` mean, because I need to apply it in a dataframe that contains n rows – newbie Sep 11 '19 at 11:58
  • 1
    @A.khou - It is same principe like [slicing list in python](https://stackoverflow.com/a/4988012), only for pandas is added `iloc`. – jezrael Sep 11 '19 at 11:59
  • if the length of my dataframe is odd I get ValueError: cannot add indices of unequal length.I tried to do it like that: `a=len(df1.index) if a % 2 == 0: for i in range (a): df['diffB'] = df['B'].iloc[1::2].to_numpy() - df['B'].iloc[::2] elif a % 2 == 1: for i in range (a): df['diffB'] = df['B'].iloc[2::].to_numpy() - df['B'].iloc[::2]` But it did not work – newbie Sep 11 '19 at 12:46
  • @A.khou - How working alternative solution `df['diffB'] = df.groupby(np.arange(len(df)) // 2)['B'].diff().shift(-1)` ? If possible, remove `shift` – jezrael Sep 11 '19 at 12:51
  • 1
    Or `df['diffB'] = df.groupby(np.arange(len(df)) // 2)['B'].shift(-1) - df['B']` – jezrael Sep 11 '19 at 12:53
  • Thank you, that helped alot, but is there a way to skip the first row beceause I could not manage to determine where should I add `skiprows=1` in this line of code – newbie Sep 11 '19 at 13:08
  • @A.khou - Sorry, not understand. :( – jezrael Sep 11 '19 at 13:32
  • if the length of the df is odd, for example 3, I should ignore the first row and then I should calculate the difference between the 2nd row and the 3rd row – newbie Sep 11 '19 at 13:35