Say I have n dataframes
, in this example n = 3.
**df1**
A B C
0 True 3 21.0
1 True 1 23.0
2 False 2 25.0
3 False 4 25.5
4 False 1 25.0
5 True 0 26.0
**df2**
A B C
0 True 3 21.0
1 True 1 23.0
2 False 2 25.0
3 False 4 25.5
4 True 2 19.0
**df3**
A B C
0 True 3 21.0
1 True 2 23.0
2 False 2 25.0
3 False 1 25.0
4 False 4 25.5
5 True 0 27.50
**dfn** ...
I want one dataframe
that includes all the rows where the value in Column C appears in every dataframe
dfn. So this is a kind of the union
of the intersection
of dataframes
on a Column, in this case Column C. So for the above dataframes, the rows with 19.0, 26.0 and 27.50 don't make it to the final dataframe
which is:
**Expected df**
0 True 3 21.0
1 True 1 23.0
2 False 2 25.0
3 False 4 25.5
4 False 1 25.0
0 True 3 21.0
1 True 1 23.0
2 False 2 25.0
3 False 4 25.5
0 True 3 21.0
1 True 2 23.0
2 False 2 25.0
3 False 1 25.0
4 False 4 25.5
So a row lives on to the final dataframe
, if and only if, the value in Column C is seen in all dataframes
.
Reproducible code:
import pandas as pd
df1 = pd.DataFrame({'A': [True,True,False,False,False,True], 'B': [3,1,2,4,1,0],
'C': [21.0,23.0,25.0,25.5,25.0,26.0]})
df2 = pd.DataFrame({'A': [True,True,False,False,False], 'B': [3,1,2,4,2],
'C': [21.0,23.0,25.0,25.5,19.0]})
df3 = pd.DataFrame({'A': [True,True,False,False,False,True], 'B': [3,2,2,1,4,0],
'C': [21.0,23.0,25.0,25.0,25.5,27.5]})
dfn = ...