I would like to join two columns to create a new column in a pandas dataframe :
df:
id v_1 v_2 v_3
35 'dfa' [u'cszc', u'bdv', u'yhs'] [u'cszc', u'bdv']
78 'dfa' [u'scaw', u'ygf', u'ompt'] [u'ompt', u'bdv']
99 'dfa' [u'svca', u'yve', u'wwca'] [u'thbsd', u'tbs']
I need:
id v_1 v_2 v_3 new_v_4 new_v_5
35 'dfa' [u'cszc', u'bdv', u'yhs'] [u'cszc', u'bdv', 'zv'] [u'bdv'] 2/3
78 'dfa' [u'scaw', u'ygf', u'ompt'] [u'ompt', u'bdv', 'tyn'] [u'ompt'] 1/3
99 'dfa' [u'svca', u'yve', u'wwca'] [u'thbsd', u'tbs'] [] 0
The "new_v_4" is to collect the intersections of column "v_2" and "v_3". The "new_v_5" is the percentage of the size of intersection over the size of "v_2". The "v_2" and "v_3" schema is object. I prefer "new_v_4" is an array of string. I tried to use "join" but do not know how to join the two object columns in one dataframe.
Raw input:
df = pd.DataFrame([[35, 'dfa', [u'cszc', u'bdv', u'yhs'], [u'cszc', u'bdv']],
[78, 'dfa', [u'scaw', u'ygf', u'ompt'], [u'ompt', u'bdv']],
[99, 'dfa', [u'svca', u'yve', u'wwca'], [u'thbsd', u'tbs']]], columns=['id','v_1','v_2','v_3'])