0

I have been looking for a solution and trying out different approaches to solve the problem in the title but haven't succeeded unfortunately.

Here is an example of what I would like to do. Say you have a dataframe with four columns:

df = pd.DataFrame(
    {
        "c1": ["A", "B", "C", "D", "E", "F"],
        "c2": ["A", "B", "X", "D", "C", "E"],
        "c3": ["B", "C", "A", "X", "E", "G"],
        "c4": ["D", "E", "B", "C", "X", "A"],
    },
    index=[0, 1, 2, 3, 4, 5],
)

Now I would like to create a fifth column that will contain only the common elements of the other four:

df = pd.DataFrame(
{
        "c1": ["A", "B", "C", "D", "E", "F"],
        "c2": ["A", "B", "X", "D", "C", "E"],
        "c3": ["B", "C", "A", "X", "E", "G"],
        "c4": ["D", "E", "B", "C", "X", "A"],
        "c5": ["A", "B", "C", "E", NaN, NaN]
    },
    index=[0, 1, 2, 3, 4, 5],
)

In other words, if an element is in multiple columns from c1 to c4, include it in column c5.

This should be possible by using the merge function, but as I said haven't succeeded.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Bora
  • 41
  • 4
  • please explain your logic, what do you mean by *that will contain only the common elements of the other four* – anky May 13 '21 at 19:25
  • Why is index row 3, C5 equal to 'E'? – Scott Boston May 13 '21 at 19:34
  • 2
    @ScottBoston the index is not really important in the first step. It can be reset after having the desired column. – Bora May 13 '21 at 19:51
  • "*This should be possible by using the merge function, but as I said haven't succeeded.*" This sentence seems out of place/unnecessary. *Why* do you think it should be possible using the `merge` function? – TylerH May 13 '21 at 20:21
  • Why do you want to create another column? What is the relationship of this variable with the other ones? Why can't it be on another dataframe? – Braiam May 13 '21 at 20:42
  • @Braiam It can also be another dataframe. – Bora May 13 '21 at 20:49
  • 1
    Possible duplicate https://stackoverflow.com/q/46556169/792066 – Braiam May 13 '21 at 20:53

1 Answers1

1

You can use functools reduce:

from functools import reduce
import numpy as np

df['col5'] = pd.Series(reduce(lambda x,y: (np.intersect1d(x, y)), df.T.values))

RESULT:

  c1 c2 c3 c4 col5
0  A  A  B  D    A
1  B  B  C  E    B
2  C  X  A  B    C
3  D  D  X  C    E
4  E  C  E  X  NaN
5  F  E  G  A  NaN
Nk03
  • 14,699
  • 2
  • 8
  • 22