0

I have seen questions like calculate the difference between rows in DataFrame & i understand Pandas provides df.diff() API but my question context is slightly different. DataFrame will consist thousands of rows with volume data till that time of the day. Name column indicates name of instrument. Need to calculate diff between Volume column only for matching Name column.

Input DataFrame :

         Date    Name   Volume
1  2011-01-03     A       10
2  2011-01-03     B       20
3  2011-01-03     C       30
4  2011-01-03     A       40
5  2011-01-03     B       30
6  2011-01-03     C       100
7  2011-01-03     A       140
8  2011-01-03     B       50
9  2011-01-03     C       120

Output DataFrame :

         Date    Name   Volume  Volume Diff
1  2011-01-03     A       10        10
2  2011-01-03     B       20        20
3  2011-01-03     C       30        30
4  2011-01-03     A       40        30
5  2011-01-03     B       30        10
6  2011-01-03     C       100       70
7  2011-01-03     A       140      100
8  2011-01-03     B       50        20
9  2011-01-03     C       120       20
BeingSuman
  • 3,015
  • 7
  • 30
  • 48

1 Answers1

1

Question is vague. What if there are multiple dates in a group?

However, with what you have given, group by name, subtract consecutive rows in Name. Some will return nulls, fill those nulls with values from Volume

   df['Volume Diff'] = df.groupby('Name')['Volume'].apply(lambda x: x.diff(1)).combine_first(df['Volume'])
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • i have given this dataset cause date will be same. Thanks for the solution, will try & update. – BeingSuman Dec 14 '22 at 03:58
  • 1
    Please let me know if it works for you. if doesnt let me know what the issue is and I will help. – wwnde Dec 14 '22 at 04:03
  • it works perfectly, thanks a ton. BTW is there a simple way to convert float values in column to int. Above solution is giving me float values in `volume diff` column, hence the follow-up. – BeingSuman Dec 14 '22 at 14:28