I have two data frames, one named 'foo' and one named 'bar'. My dataframe 'foo' has some unique columns and my dataframe 'bar' also has some unique columns. However, they both share one column, the column 'google'. I am trying to see if there is a way to keep all columns in data frame 1, 'foo', and add one additional column, the column 'CLRS', which will be 1 if the content in the column 'google' in that row of 'foo' appears somewhere in the column 'google' in 'bar'.
More specifically, let's assume that the dataframes I have are structured like this: 'foo' contains columns: 'foo_1','foo_2', ..., 'google' and bar contains columns: 'bar_1', 'bar_2, ..., 'google'. I want to join/merge 'foo' and 'bar' in a way such that 'foo' has an additional column, 'CLRS', such that 'CLRS' has a 1 if the contents of 'google' in that row of 'foo' appear at some point in the 'google' column of 'bar'. I have tried the following code:
'''
# foo examples
foo['foo1'] = ['dijkstra','TSP',...]
foo['foo2'] = ['Oculus','VR', ...]
.
.
.
foo['google'] = ['search','ads', 'A/B Testing', 'UI' ...]
# bar examples
bar['bar1'] = ['dijkstra','TSP',...]
bar['bar2'] = ['search','ads', ...]
.
.
.
# 'A/B Testing' appears in the column somewhere but 'ads' does
# not
bar['google'] = ['search','google_search', 'TDD', 'UI',
...,'A/B Testing', ...]
# my code
foo_merged =
foo.join(bar, how = 'left')
# my result
foo_merged['foo1'] = ['dijkstra','TSP',...]
foo_merged['foo2'] = ['search','ads', ...]
.
.
.
foo_merged['google'] = ['search','ads', ...]
foo_merged['CLRS'] = ['search','google_search', 'TDD', 'UI',
...]
# What I want as an output for foo_merged is:
foo_merged['foo1'] = ['dijkstra','TSP',...]
foo_merged['foo2'] = ['search','ads', ...]
.
.
.
foo_merged['google'] = ['search','ads', 'A/B Testing', 'UI'
...]
foo_merged['CLRS'] = [1,0,1,1,...]
'''
Unfortunately, after running the previous join code, foo_merged contains all columns of foo and one additional column which contains always the contents of the column 'google' from 'bar'. My desired result instead would be a df such that the additional column 'CLRS' contains 1 if the content of 'google' in that row of 'foo' appears somewhere as a content of the column 'google' in 'bar' and 0 otherwise.