0

I have a timeseries data in the following format:

| quote_datetime |  Moneyness | underlying_bid | askC | askP | bidC | bidP | fwd_premium_abs|
|-----------------|-----------|-----------------|------|-----|------|------|------------|
| 2020-02-25 14:10:00 | 0.980861244 |3134.6 | 73.8 | 10.8 |66.5 | 10.4 |63.4|
| 2020-02-25 14:10:00 | 0.990861244 |3134.6 | 73.4 | 11.8 |63.5 | 11.4 |65.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 70.4 | 12.8 |61.5 | 14.4 |64.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 72.4 | 10.1 |60.1 | 12.4 |67.4|

for each unique value in quote_datetime,I need minimum Fwd_premium_abs i tried doing it using for loop

for j in df['quote_datetime'].unique():
       temp =df[df['quote_datetime']==j]['fwd_premium_abs'].min()

but this is a very inefficient way of doing it and computationally expensive if doing over huge dataset. What are the optimal ways of doing it?

here is a small part of data in dictionary format

{'strike': {0: 3075.0, 1: 3075.0, 2: 3075.0, 3: 3075.0, 4: 3075.0},
 'Date': {0: datetime.date(2020, 2, 25),
  1: datetime.date(2020, 2, 25),
  2: datetime.date(2020, 2, 25),
  3: datetime.date(2020, 2, 25),
  4: datetime.date(2020, 2, 25)},
 'quote_datetime': {0: Timestamp('2020-02-25 14:10:00'),
  1: Timestamp('2020-02-25 14:12:00'),
  2: Timestamp('2020-02-25 14:19:00'),
  3: Timestamp('2020-02-25 14:20:00'),
  4: Timestamp('2020-02-25 14:22:00')},
 'Moneyness': {0: 0.9808612440191388,
  1: 0.9808612440191388,
  2: 0.9808612440191388,
  3: 0.9808612440191388,
  4: 0.9808612440191388},
 'underlying_bid': {0: 3134.6, 1: 3135.8, 2: 3137.29, 3: 3136.91, 4: 3136.99},
 'askC': {0: 73.8, 1: 74.4, 2: 76.7, 3: 74.8, 4: 74.2},
 'askP': {0: 10.8, 1: 10.9, 2: 10.5, 3: 10.7, 4: 10.7},
 'bidC': {0: 66.5, 1: 69.1, 2: 70.1, 3: 71.7, 4: 71.2},
 'bidP': {0: 10.4, 1: 10.3, 2: 9.4, 3: 10.2, 4: 10.2},
 'fwd_premium_abs': {0: 63.4, 1: 64.10000000000001, 2: 67.3, 3: 64.6, 4: 64.0}}
yash
  • 27
  • 5
  • 2
    Sounds like you'll need to use a groupby, but it's hard to tell without being able to reproduce your data. Please post your data as a dictionary using df.to_dict() – Joe Carboni Aug 14 '22 at 23:43
  • 1
    added a small part of the data dictionary @joe_carboni in the larger the dataset, there are multiple rows for each 'quote_datetime' – yash Aug 14 '22 at 23:49
  • You code that you "tried" would not work – Vladimir Fokow Aug 14 '22 at 23:51
  • it works fine on small data set but for 1million rows it becomes slow coz of for loop – yash Aug 14 '22 at 23:52
  • sorry my bad That was for filtering data. i have corrected it corrected again for i and j – yash Aug 14 '22 at 23:55

2 Answers2

1

Using groupby on the 'quote_datetime' column to find the minimum in 'fwd_premium_abs'.

df.groupby('quote_datetime')['fwd_premium_abs'].min()

output

quote_datetime
2020-02-25 14:10:00    63.4
2020-02-25 14:12:00    64.1
2020-02-25 14:19:00    67.3
2020-02-25 14:20:00    64.6
2020-02-25 14:22:00    64.0

Is this what you were looking for?

Since you provided all unique datetimes to begin with, I changed a datetime to demonstrate what this looks like with a repeated Timestamp (now at row indices 1 and 2)

>>> df.iat[2,2] = Timestamp('2020-02-25 14:12:00')
>>> df
   strike        Date      quote_datetime  Moneyness  underlying_bid  askC  askP  bidC  bidP  fwd_premium_abs
0  3075.0  2020-02-25 2020-02-25 14:10:00   0.980861         3134.60  73.8  10.8  66.5  10.4             63.4
1  3075.0  2020-02-25 2020-02-25 14:12:00   0.980861         3135.80  74.4  10.9  69.1  10.3             64.1
2  3075.0  2020-02-25 2020-02-25 14:12:00   0.980861         3137.29  76.7  10.5  70.1   9.4             67.3
3  3075.0  2020-02-25 2020-02-25 14:20:00   0.980861         3136.91  74.8  10.7  71.7  10.2             64.6
4  3075.0  2020-02-25 2020-02-25 14:22:00   0.980861         3136.99  74.2  10.7  71.2  10.2             64.0


df.groupby('quote_datetime')['fwd_premium_abs'].min()

output

quote_datetime
2020-02-25 14:10:00    63.4
2020-02-25 14:12:00    64.1
2020-02-25 14:20:00    64.6
2020-02-25 14:22:00    64.0
Joe Carboni
  • 421
  • 1
  • 6
  • This is what I was looking for but i didn't use group by for a reason. I needed the entire row entry at the minima instead of just the minimum value of fwd_premium_abs.. eg if i do pivot_table.groupby('quote_datetime')[['fwd_premium_abs','strike ']].min() .. I will get minimum fwd_premium_abs but i don't get the strike corresponding to that minimum value . Instead I get minimum of both – yash Aug 15 '22 at 00:46
  • Thank you so much for such a detailed repoonse – yash Aug 15 '22 at 00:52
1

You can use .transform after grouping if you want your dataframe to stay in the same shape:

df['fwd_premium_abs'] = df.groupby('quote_datetime')['fwd_premium_abs'].transform(min)

All rows with the same 'quote_datetime' will have the same value in the column 'fwd_premium_abs'. This value will be the minimum of their previous values in this column.


Edit:

If you want to select the rows:

df.loc[df.groupby('quote_datetime')['fwd_premium_abs'].idxmin()]
Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
  • Thanks .. This was helpfull. but i needed the entire rows corresponding to the minimum value of 'fwd_premium_abs' at every'quote_datetime'. sorry for not being clear – yash Aug 15 '22 at 00:51
  • @yash added solution to select the entire rows – Vladimir Fokow Aug 15 '22 at 01:02