1

I am trying to get intersection between 'game' and 'sample' dataframes if there rows match. The dataframes are of unequal sizes, and I don't want a row to be counted twice for intersection.

Eg, sample dataframe has rows [0,1,1],[1,1,0],[1,0,1],[0,1,1]

and game dataframe has rows [1,1,0],[1,1,0],[1,0,1],[1,1,1],[1,0,1].

Now the intersection dataframe should have the rows [1,1,0],[1,0,1].

import pandas as pd
import numpy as np
import random
trials = 1000
games = 3
data = pd.DataFrame()         
for i in range(trials):
    for j in range(games):
        data.loc[i,j] = random.choice([0,1])

sample = pd.DataFrame()
for i in range(trials):
    for j in range(games):
        if ((data.loc[i,:]).sum()) >= 2:
            sample.loc[i,j] = data.loc[i,j]

game = pd.DataFrame()
for i in range(trials):
    for j in range(games):
        if (data.loc[i,0]) == 1:
            game.loc[i,j] = data.loc[i,j]

intersection = pd.DataFrame()
for i in range(len(sample)):
    if np.all(sample.iloc[i,:] == game.iloc[i,:]):
        for j in range(games):
            intersection.loc[i,j] = sample.loc[i,j]


Naga kiran
  • 4,528
  • 1
  • 17
  • 31
  • Does this answer your question? [Pandas - intersection of two data frames based on column entries](https://stackoverflow.com/questions/26921943/pandas-intersection-of-two-data-frames-based-on-column-entries) – dspencer Mar 11 '20 at 04:34

1 Answers1

1

You can try of checking similar rows in the second dataframe with pandas pd.DataFrame.isin condition

df1 = pd.DataFrame([[0,1,1],[1,1,0],[1,0,1],[0,1,1]])
df2 = pd.DataFrame([[1,1,0],[1,1,0],[1,0,1],[1,1,1],[1,0,1]])

df1[df1.isin(df2).all(1)]

Out:

    0   1   2
1   1   1   0
2   1   0   1
Naga kiran
  • 4,528
  • 1
  • 17
  • 31