-2

I want to get common items of three columns of the dataframe.

    user_id top_lang_owned  top_lang_committed  top_lang_watched
1   21  [ruby, javascript, go]              [go, ruby]              [ruby, javascript, go]
2   38  [ruby, javascript, coffeescript]    [ruby, coffeescript]    [ruby, javascript, go]
3   108 [ruby, shell, go]                   NaN                     [ruby, javascript, go]
4   173 [ruby, javascript, shell]           [ruby, css, javascript] [ruby, javascript, css]

Desired output is;

    user_id top_lang_owned  top_lang_committed  top_lang_watched
1   21  [ruby, go]
2   38  [ruby]
3   108 NaN
4   173 [ruby, javascript]  

How can I obtain this output?

babeyh
  • 659
  • 2
  • 7
  • 19
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). "Show me how to solve this coding problem?" is off-topic for Stack Overflow. You have to make an honest attempt at the solution, and then ask a *specific* question about your implementation. Stack Overflow is not intended to replace existing tutorials and documentation. – Prune Nov 14 '20 at 21:11
  • This is simply the set intersection of the three columns. Where are you stuck? Please provide the expected [MRE](https://stackoverflow.com/help/minimal-reproducible-example). – Prune Nov 14 '20 at 21:12
  • I have a problem with NaN cells. Maybe it is simply for you, but I can't solve it. – babeyh Nov 14 '20 at 21:35
  • https://stackoverflow.com/questions/46427558/pandas-multiple-column-intersection I have founded a question similar to mine but it did not work. – babeyh Nov 14 '20 at 21:38
  • Not as such, but there are posting guidelines that lay out the expectations for questions and answers, derived from the stated purposes of this site. – Prune Nov 15 '20 at 00:14

1 Answers1

1

suppose you have data frame looking like this

            a          b
0   [9, 6, 2]  [3, 2, 1]
1   [2, 7, 2]  [2, 4, 1]
2   [2, 3, 1]  [5, 7, 8]
3   [3, 6, 6]  [1, 6, 7]

def common_words(x):
    a = set(x.a)  
    b = set(x.b)
    return list(a&b)

df1 = df.copy() #if you don't want to change the original

df1['c'] = df1.apply(lambda:x common_words(x),axis=1) # add to column c list of common words

df1 = df1[df1['c'].str.len()!=0] #drop rows with empty list

df1 = df1[["a","c"]] # take just the columns you want

not very elegant but it should work

trigonom
  • 528
  • 4
  • 9