I am trying to extract common rows from several dataframes using pd.concat:
>>> import numpy as np
>>> import pandas as pd
>>> x = np.random.random(size=(5, 3))
>>> df1 = pd.DataFrame(x)
>>> df2 = pd.DataFrame(x[1:3])
>>> df3 = pd.DataFrame(x[2:4])
>>> df1
0 1 2
0 0.257662 0.453542 0.805230
1 0.060493 0.463148 0.715994
2 0.452379 0.470137 0.965709
3 0.447546 0.964252 0.163247
4 0.187212 0.973557 0.871090
>>> df2
0 1 2
0 0.060493 0.463148 0.715994
1 0.452379 0.470137 0.965709
>>> df3
0 1 2
0 0.452379 0.470137 0.965709
1 0.447546 0.964252 0.163247
As you can see, only the row 0.452379 0.470137 0.965709
is common to all three dataframes. To extract it, I tried:
>>> pd.concat([df1, df2, df3], join='inner')
0 1 2
0 0.257662 0.453542 0.805230
1 0.060493 0.463148 0.715994
2 0.452379 0.470137 0.965709
3 0.447546 0.964252 0.163247
4 0.187212 0.973557 0.871090
0 0.060493 0.463148 0.715994
1 0.452379 0.470137 0.965709
0 0.452379 0.470137 0.965709
1 0.447546 0.964252 0.163247
Thus, join==inner
doesn't seem to work! I should also point out that ignore_index=True
has no effect on the behavior. In an article on Real Python, using axis=1
is suggested. However, it is wrong in my opinion:
>>> pd.concat([df1, df2, df3], join='inner', axis=1)
0 1 2 0 1 2 0 1 2
0 0.257662 0.453542 0.805230 0.060493 0.463148 0.715994 0.452379 0.470137 0.965709
1 0.060493 0.463148 0.715994 0.452379 0.470137 0.965709 0.447546 0.964252 0.163247
What is wrong with what I am doing? Also, how would I extract common rows from several dataframes if this way doesn't work? I am using Pandas version 0.25.3.