I have a pandas dataframe where 'Column1' and 'Column2' contain lists of words in every row. I need to create a new column with the number of words repeated in Column1's list and Column2's list for every row. For example, in an especific row I could have ['apple', 'banana'] in Column1, ['banana', 'orange'] in Column2, and I need to add a third new column containing the number '1', since only one word (banana) is in both lists.
I tried to do it like this:
for index, row in df.iterrows():
value = len(list(set(row['Column1']) & set(row['Column2'])))
row['new_column'] = value
But the new column did not appear in the dataframe. I tried a second approach, creating the column first and setting it to 0 and then updating the values like this:
df['new_column'] = 0
for index, row in df.iterrows():
value = len(list(set(row['Column1']) & set(row['Column2'])))
df.at[index,'new_column'] = value
But this didn't work either, the column is not updated. I tried a third approach using .apply like this:
df['new_column'] = df.apply(lambda x: len(list(set(x['Column1']) & set(x['Column2'])))
And then I got this error:
KeyError: 'Column1'
I don't know why any of this is working and neither I know any other way to try it. How can I make this work? Thank you!