I have 2 dataframes with a column called frames
. The dataframes are about data extracted from 2 videos recorded at the same time of two participants. The data lacks some frames (different per video) due to failed tracking. I want to take an intersection based on the frame integer value df['frame']
.
A similar question is posted here: Pandas - intersection of two data frames based on column entries , but the accepted answer is a join, not an intersection.
Example data
import pandas as pd
df1 = pd.DataFrame(data={'frame': [1, 2, 3]})
df2 = pd.DataFrame(data={'frame': [2, 3, 4]})
Desired output
Removed rows not in union of df1['frame']
and df2['frame']
>>> print(df1)
frame
1 2
2 3
>>> print(df2)
frame
0 2
1 3
(I can reset the index after I'm done processing with df1.reset_index(drop=True)
)
Tried
I thought of first getting the intersection of the frames column of both dataframes with:
df1_idx = df1['frame']
df2_idx = df2['frame']
intersection_idx = df1_idx.intersection(df2_idx)
Error:
File "/*python_path*/site-packages/pandas/core/generic.py", line 3081, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'intersection'
After getting the frame indexes that are in both dataframes, I was thinking of doing something like (mentioned in: dropping rows from dataframe based on a "not in" condition):
df1 = df1.drop(df[~df['frame'].isin(intersection_idx)])
System
Python 3.6.5 with pandas 0.22.0 installed with Anaconda.