I have been looking for a solution and trying out different approaches to solve the problem in the title but haven't succeeded unfortunately.
Here is an example of what I would like to do. Say you have a dataframe with four columns:
df = pd.DataFrame(
{
"c1": ["A", "B", "C", "D", "E", "F"],
"c2": ["A", "B", "X", "D", "C", "E"],
"c3": ["B", "C", "A", "X", "E", "G"],
"c4": ["D", "E", "B", "C", "X", "A"],
},
index=[0, 1, 2, 3, 4, 5],
)
Now I would like to create a fifth column that will contain only the common elements of the other four:
df = pd.DataFrame(
{
"c1": ["A", "B", "C", "D", "E", "F"],
"c2": ["A", "B", "X", "D", "C", "E"],
"c3": ["B", "C", "A", "X", "E", "G"],
"c4": ["D", "E", "B", "C", "X", "A"],
"c5": ["A", "B", "C", "E", NaN, NaN]
},
index=[0, 1, 2, 3, 4, 5],
)
In other words, if an element is in multiple columns from c1 to c4, include it in column c5.
This should be possible by using the merge function, but as I said haven't succeeded.