0

I have 2 dataframes with a column called frames. The dataframes are about data extracted from 2 videos recorded at the same time of two participants. The data lacks some frames (different per video) due to failed tracking. I want to take an intersection based on the frame integer value df['frame'].

A similar question is posted here: Pandas - intersection of two data frames based on column entries , but the accepted answer is a join, not an intersection.

Example data

import pandas as pd

df1 = pd.DataFrame(data={'frame': [1, 2, 3]})
df2 = pd.DataFrame(data={'frame': [2, 3, 4]})

Desired output

Removed rows not in union of df1['frame'] and df2['frame']

>>> print(df1)
   frame
1      2
2      3

>>> print(df2)
   frame
0      2
1      3

(I can reset the index after I'm done processing with df1.reset_index(drop=True))

Tried

I thought of first getting the intersection of the frames column of both dataframes with:

df1_idx = df1['frame']
df2_idx = df2['frame']
intersection_idx = df1_idx.intersection(df2_idx)

Error:

File "/*python_path*/site-packages/pandas/core/generic.py", line 3081, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'intersection'

After getting the frame indexes that are in both dataframes, I was thinking of doing something like (mentioned in: dropping rows from dataframe based on a "not in" condition):

df1 = df1.drop(df[~df['frame'].isin(intersection_idx)])

System

Python 3.6.5 with pandas 0.22.0 installed with Anaconda.

NumesSanguis
  • 5,832
  • 6
  • 41
  • 76

1 Answers1

2

what about

df1[df1.frame.isin(df2.frame)]
Out: 
   frame
1      2
2      3

df2[df2.frame.isin(df1.frame)]
Out: 
   frame
0      2
1      3
SpghttCd
  • 10,510
  • 2
  • 20
  • 25