Python pandas dataframe: delete rows where value in column exists in another

Question

I have the following pandas dataframe:

enter image description here

and would like to remove the duplicate rows.

For example:

(Atlanta Falcons/Jacksonville Jaguars is found as Jacksonville Jaguars/Atlanta Falcons).

What is the best way to do so?

Thanks!

Hello! It'll help people to help you if you post your data in a reproducible format, not as a screenshot or image. There is a great post on Stack Overflow about [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) that you should check out and then edit your post based off of it. — scotscotmcc, Nov 30 '21 at 02:10

score 3 · Answer 1 · answered Nov 30 '21 at 02:57

The code that will do the trick for you is this one:

df["team_a"] = np.minimum(df['team1'], df['team2'])
df["team_b"] = np.maximum(df['team1'], df['team2'])

df.drop_duplicates(["season","week","team_a","team_b"],inplace= True)
df.drop(columns= ["team_a","team_b"],inplace= True)

Before doing this, please check your data, because when team1 and team2 are inverted, the columns team1_score and team2_score are not being inverted, so it may be confusing after you remove one of the rows.

tomathon · Answer 2 · 2021-11-30T02:58:27.633

Because OP did not provide a reproducible dataset:

import pandas as pd

# dataset where the 1st and 5th observations are team A vs team F:
df = pd.DataFrame({
    "season": [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
    "week": [12, 12, 12, 12, 12, 13, 13, 13, 13, 13],
    "team1": ["A", "B", "C", "D", "F", "A", "B", "C", "D", "F"],
    "team2": ["F", "G", "H", "I", "A", "F", "G", "H", "I", "A"]
})

df
    season  week    team1   team2
0     2021    12        A       F
1     2021    12        B       G
2     2021    12        C       H
3     2021    12        D       I
4     2021    12        F       A
5     2021    13        A       F
6     2021    13        B       G
7     2021    13        C       H
8     2021    13        D       I
9     2021    13        F       A

# solution:
df[[df["team1"].str.contains(c) == False for c in df["team2"].tolist()][0]]
    season  week    team1   team2
0     2021    12        A       F
1     2021    12        B       G
2     2021    12        C       H
3     2021    12        D       I
4     2021    13        A       F
5     2021    13        B       G
6     2021    13        C       H
7     2021    13        D       I

sorry for not posting a reproducible dataset, this is read in as a df using pd.read_csv for all NFL data, I tried just running the last line and it doesn't seem to do the trick? — johndoe1839, Nov 30 '21 at 02:47
I don't know what to tell you. I just edited my code example to reflect multiple weeks (12 and 13) and my code still works. — tomathon, Nov 30 '21 at 02:59
No problem! If my answer worked for you please mark it with the checkmark (if not, no worries). — tomathon, Nov 30 '21 at 03:48

Python pandas dataframe: delete rows where value in column exists in another

2 Answers2