1

How to compare two column and count how many same items/string between in a dataframe them using python? for example:

row | column A | column B |
============================
 1  | ['NNP',  | ['NNP',  |
    |  'NNP',  | 'NNP',   |
    |  'NNP',  | 'VB',    |
    |  'NNP',  | 'NN',    |
    |  'CC',   | 'NN',    |
    |  'RB',   | 'Z']     |
    |  'NN',   |          |
    |  'Z',    |          |
2   | ['NNP',  | ['NNP',  |
    |  'VB',   |  'NN',   |
    |  'NN',   |  'VB']   |
    |  'NN',   |          |
    |  'Z']    |          |

what i want to get is:

   row | column A | column B |  count_same_string
    ==============================================
     1  | ['NNP',  | ['NNP',  | 4
        |  'NNP',  | 'NNP',   |
        |  'NNP',  | 'VB',    |
        |  'NNP',  | 'NN',    |
        |  'CC',   | 'NN',    |
        |  'RB',   | 'Z']     |
        |  'NN',   |          |
        |  'Z',    |          |
    2   | ['NNP',  | ['NNP',  |2
        |  'VB',   |  'NN',   |
        |  'NN',   |  'RB']   |
        |  'NN',   |          |
        |  'Z']    |          |
thenoirlatte
  • 341
  • 3
  • 19

2 Answers2

1

You can adapt the answer from this post and retrieve the length using list comprehension:

from collections import Counter

df = pd.DataFrame({"column A":[["NNP","NNP","NNP","NNP","CC","RB","NN","Z"]],
                   "column B":[["NNP","NNP","VB","NN","NN","Z"]]})

df["result"] = [len(list((Counter(a) & Counter(b)).elements()))
                for a,b in zip(df["column A"], df["column B"])]

print (df)

                              column A                   column B  result
0  [NNP, NNP, NNP, NNP, CC, RB, NN, Z]  [NNP, NNP, VB, NN, NN, Z]       4
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
0

You can do that with the following code.

df['count_same_string'] = df.apply(lambda row: len(set(row['column A']).intersection(row['column B'])))
Sin-seok Seo
  • 252
  • 2
  • 6