I am trying to apply a function convert_label() to a column CR_df['label'] of my dataframe. The outputs of the function are stored in a separate column CR_df['y']. However, my CR_df['label'] column has cells with NaN values. I want to apply my function only to cells in CR_df['label'] that don't have NaN values. If the cell does have an NaN value, I want to return NaN in the corresponding CR_df['y'] cell.
I don't want to check if I have NaN values, I need to return NaN if NaN.
My (error-prone) attempt at a solution
def convert_label(label):
if "pos" in label:
output = 1.0
elif "neg" in label:
output = 0.0
else:
output = label
return output
I have tried to convert NaN to string and then applied my function but now I need to change all the string "nan" in CR_df['y'] to actual NaN or null values
CR_df['y'] = CR_df['label'].astype(str).apply(convert_label)
I've attached a picture of my output
Also, here is the code for my dataframe
CR_train_file='data/custrev_train.tsv'
CR_test_file = 'data/custrev_test.tsv'
CR_train_df = pd.read_csv(CR_train_file, sep='\t', header=None)
CR_train_df.columns = ['index', 'label', 'review']
CR_test_df = pd.read_csv(CR_test_file, sep='\t', header=None)
CR_test_df.columns = ['index', 'review']
CR_test_df
CR_df = pd.concat([CR_train_df,CR_test_df], axis=0, ignore_index=True)