The UCI congressional vote dataset where 1.0 is yay, 0.0 is nay and NaN is abstain. The second set of columns is what I'm trying to add to the dataframe but those values are incorrect. I am trying to binarize this dataframe so have something like:
100 for yay
010 for nay
001 for abstain
so I can run association rules.
I was able to create 16 extra columns (because there are 16 votes from v1 to v16) for abstain.
However, when I try to create the 16 nay columns by checking what the value is in the original vote column shown above, it does not work as you can see above say for nay_v1
it should be 1,1,0,1,0
but it is 1,1,1,1,1
.
The abstain columns were created by using isna()
but for nay I want to check if the vote column value is 0.0 and if so input 1.0 for the nay column for that vote.
I tried two ways using loc and iloc based on answers on this site but neither work, I think outputs were both what I posted above.
First method:
for (idx, row) in cvotes.iterrows():
for c in cols:
if row.loc[c]==0.0:
cvotes[f'nay_{c}'] = 1.0
elif row.loc[c] == 1.0:
cvotes[f'nay_{c}'] = 0.0
elif row.loc[c] == np.nan:
cvotes[f'nay_{c}'] = 0.0
Second method:
for c in cols:
for i in range(len(cvotes.iloc[:][c])):
val = cvotes.iloc[i][c]
if val == 0.0:
cvotes[f'nay_{c}'] = 1.0
else:
cvotes[f'nay_{c}'] = 0.0
What am I doing wrong here? It's fairly frustrating because I thought I was okay with numpy array indexing and even Python list indexing.
Edit:
Sample dataframe:
cvotes = pd.read_csv('house-votes-84.data', sep=',', header=None)
cvotes.head()
cvotes.columns = ['party', 'v1','v2','v3', 'v4','v5','v6','v7',
'v8', 'v9', 'v10', 'v11', 'v12', 'v13','v14','v15',
'v16']
cvotes.head()
Download csv from: http://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records
This is the result I want:
v1 nay_v1
0.0 1.0
0.0 1.0
NaN 0.0
0.0 1.0
1.0 0.0
Updated my code but now I just get 0's
# make cols for is nay
for c in cols:
#make column preset to val
cvotes[f'nay_{c}']= 0.0
#iterate and change vals on vote col condition
for i in range(len(cvotes.iloc[:][c])):
val = cvotes.iloc[i][c]
#print(val)
if val == 0.0:
cvotes.iloc[i][f'nay_{c}'] = 1.0
else:
cvotes.iloc[i][f'nay_{c}'] = 0.0