0

I have a dataframe 'df' from this csv file (link attached) : https://drive.google.com/open?id=1UiarOKWJo3HcdA1lh5EcR5lrtSjCOtAX

I apply the following function to the above df: `

def createKeys(df):
    df.sort_values(by = ['KEY'], inplace = True)
    df['Whatever'] = df['KEY'].apply(lambda x: x.split("_")[0] + "_" + x.split("_")[1])
    df['STRIKE_PRICE'] = df['KEY'].apply(lambda x: float(x.split("_")[2]))
    df['STRIKE_PR'] = df['STRIKE_PRICE']
    df['OPTION_TYP'] = df['KEY'].apply(lambda x: x.split("_")[3])
    df.set_index(keys = ['Whatever','OPTION_TYP'], inplace= True)
    return df

df= createKeys(df)

I get the following data-frame (some columns not present in the picture- horizontal length of table > snip size of image): enter image description here

To each key in "KEY" (index) , I want all the strike prices from the "STRIKE_PR" column (not present in the image) and perform actions separately on the "CE" and "PE" option type (for all those strikes).

Eg. Say I have strikes 1260 CE, 1260 PE , 1280 CE, 1280 PE, 1300 CE , 1300 PE for the key = 'ACC_JUL-2018' I want to filter out these strikes [1260, 1280, 1300] for the group CE and perform actions on these strikes. Similarly for PE. Can you help me do this? Thanks in advance.

1 Answers1

1

Here is what I would do. First, I would create a specific column for my 1280 1260 etc. Let's call it KEY2

def split(s):
    return(s.split("_")[2])

df["key2"] = df["KEY"].apply(split)

Then, I would split my dataframe to perform the actions separately. Say you want to find the median of keys2 for PE, you would say

df[(df["option_TYP"]=="PE")]["key2"].median
zar3bski
  • 2,773
  • 7
  • 25
  • 58