The following code transforms a given pandas column FEAT
into a new, binary feature named STREAM
. The program works as long as there are no NaN values in the original dataframe. If that is the case, the following exception occurs: ValueError: Length of values does not match length of index
.
I need to push the NaN values to the new column. Is it doable?
Here is the code option that fails:
import pandas as pd
import numpy as np
data = {
'FEAT': [8, 15, 7, np.nan, 5, 2, 11, 15]
}
customer = pd.DataFrame(data)
customer = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David', 'Bob', 'Sally', 'Mia', 'Luis'])
#create binary variable STREAM 0:mainstream 1:avantgarde
stream_0 = [1, 3, 5, 8, 10, 12, 14]
stream_1 = [2, 4, 6, 7, 9, 11, 13, 15]
# convert FEAT to list_0
list_0 = customer['FEAT'].values.tolist()
# create a list of length = len(customer) whose elements are:
# 0 if the value of 'FEAT' is in stream_0
# 1 if the value of 'FEAT' is in stream_1
L = []
for i in list_0:
if i in stream_0:
L.append(0)
elif i in stream_1:
L.append(1)
# convert the list to a new column of customer df
customer['STREAM'] = L
print(customer)