Here is a dataframe of financial instrument identifiers
import pandas as pd
import numpy as np
df = pd.DataFrame([["ISIN1", "CUSIP1", "SEDOL1"],
["ISIN2", "CUSIP2", "SEDOL2"],
["ISIN3", "CUSIP3", "SEDOL3"],
["ISIN4", "CUSIP4", "SEDOL4"]],
columns=["ISIN", "CUSIP", "SEDOL"])
df
ISIN CUSIP SEDOL
0 ISIN1 CUSIP1 SEDOL1
1 ISIN2 CUSIP2 SEDOL2
2 ISIN3 CUSIP3 SEDOL3
3 ISIN4 CUSIP4 SEDOL4
Imagine that several entries are missing
df.iloc[(1,1)] = np.nan
df.iloc[(1,2)] = np.nan
df.iloc[(2,0)] = np.nan
df.iloc[(3,0)] = np.nan
df.iloc[(3,1)] = np.nan
df
ISIN CUSIP SEDOL
0 ISIN1 CUSIP1 SEDOL3
1 ISIN2 NaN NaN
2 NaN CUSIP3 SEDOL3
3 NaN NaN SEDOL4
In column ID I want to capture a single variable based on this hierarchy : If ISIN is missing, I want to populate CUSIP. If CUSIP is also missing I want to populate SEDOL.
I tried this nested if statement:
def identifier(row):
if ~pd.isnull(row['ISIN']):
return row['ISIN']
elif pd.isnull(row['ISIN']) & ~pd.isnull(row['CUSIP']):
return row['CUSIP']
elif pd.isnull(row['ISIN']) & pd.isnull(row['CUSIP']) & ~pd.isnull(row['SEDOL']):
return row['SEDOL']
df['ID'] = df[['SEDOL', 'CUSIP', 'ISIN']].apply(identifier, axis=1)
Which returned bad output in the ID column for the last 2 entries.
ISIN CUSIP SEDOL ID
0 ISIN1 CUSIP1 SEDOL1 ISIN1
1 ISIN2 NaN NaN ISIN2
2 NaN CUSIP3 SEDOL3 NaN
3 NaN NaN SEDOL4 NaN
My expected output is this:
ISIN CUSIP SEDOL ID
0 ISIN1 CUSIP1 SEDOL1 ISIN1
1 ISIN2 NaN NaN ISIN2
2 NaN CUSIP3 SEDOL3 CUSIP3
3 NaN NaN SEDOL4 SEDOL4
Hope i've explained that well. Note that 'ISIN' is a string. There is no use of the .isin function in my code. Thank you in advance.