I have a dataframe that looks like this
df = pd.DataFrame({"jointid": ['ab', 'ac', 'bc'],
"id": ['a', 'a', 'b'],
"dog": [0, 0, 0],
"cat": [1, 1, 1],
"id2": ['b', 'c', 'c'],
"dog2": [0, 1, 1],
"cat2": [1, 0, 0],
"common": [np.nan, np.nan, np.nan]})
I need to fill the common column with a dummy variable equal to 1 when both ids on the row have the same animal category animal=animal2. I use dog and cat here, but in the full data set I have 80 categories twice in each row to find these combinations. The desired output for this example is:
df = pd.DataFrame({"jointid": ['ab', 'ac', 'bc'],
"id": ['a', 'a', 'b'],
"dog": [0, 0, 0],
"cat": [1, 1, 1],
"id2": ['b', 'c', 'c'],
"dog2": [0, 1, 1],
"cat2": [1, 0, 0],
"common": [1, 0, 0]})
I have tried a lot of different methods, but the hang up seems to be in using the column names as list. Here's the gist of what I've been trying:
net = list(df.loc[:,'dog':'cat'].columns)
for x in net:
diff['common'] = np.where(df[x]==df[x+'2'], 1, 0)
Either a get a value of 1 assigned to everything or errors related to the list. Any help is appreciated!