I need to check for every row in a dataframe if the value in a certain column is above 0 or not.
tshirt pants sweater socks Product_1 Product_2 Product_3 Expected
0 1 0 1 sweater tshirt pants True
1 1 0 1 sweater tshirt socks True
0 1 0 0 socks sweater socks False
1 1 0 1 sweater tshirt sweater True
0 0 0 0 socks sweater tshirt False
So for example the value in column 'Product_1' is 'tshirt', I need to check the thshirt column if the value is above 0 or not.
If the value is above 0 for one of the values in the three 'Product' columns, another column could say True, else False (see Expected column)
Code to produce sample data:
import pandas as pd
import numpy as np
recomendations = ['tshirt', 'pants', 'sweater', 'socks']
size = 100
data = pd.DataFrame()
# Generate data
for idx, i in enumerate(recomendations):
data[i] = np.random.choice([0,1], size=100)
if idx <= 3:
data[f'Product_{idx}'] = np.random.choice(recomendations, size=size)
# Sort
data[recomendations + ['Product_1', 'Product_2', 'Product_3']]
So far i have computed a percentage of True value in a very slow way by looping over the frame:
track = []
no_purchase = 0
cols = list(frame.columns)
str_cols = ['Product_1', 'Product_2', 'Product_3']
for idx, val in frame[column].iteritems():
if frame.iloc[idx, cols.index(val)] > 0:
track.append(1)
else:
track.append(0)
if frame.loc[idx, [i for i in frame.columns if i not in str_cols]].sum() < 1:
no_purchase += 1
result = no_purchase / (len(track) - np.sum(track))
return result