"i have a data set in csv there it is a field name Episode where we will take data for future sport events we have"""INDIA VS PAKISTAN AND PAKISTAN VS INDIA for same date is there any option to delete the duplicate
Thanks in advance
"i have a data set in csv there it is a field name Episode where we will take data for future sport events we have"""INDIA VS PAKISTAN AND PAKISTAN VS INDIA for same date is there any option to delete the duplicate
Thanks in advance
One idea you could use would be to pandas rank method, group by the needed columns
df["RANK"] = df.groupby("Column_1")["Column_2"].rank(method="first", ascending=True)
This should return dataframe by grouping, so three rows of dupes should be ranked 1,2 and 3 respectively. From there, you can take the subset of the dataframe where rank=1
and this will give you a dataframe with no dupes.
Create a new match column then drop_duplicates
# sample df
df = pd.DataFrame({'a': [1,1,1,1,1],
'b': ['Bulldogs at Aztecs', 'Aztecs at Bulldogs', 'Bearcats at Huskies', 'Huskies at Bearcats', 'something else']})
# list comprehension and sort words in string
df['match'] = [' '.join(sorted(x.split())) for x in df['b'].values]
# a b match
# 0 1 Bulldogs at Aztecs Aztecs Bulldogs at
# 1 1 Aztecs at Bulldogs Aztecs Bulldogs at
# 2 1 Bearcats at Huskies Bearcats Huskies at
# 3 1 Huskies at Bearcats Bearcats Huskies at
# 4 1 something else else something
# drop_duplicates
df.drop_duplicates(['a', 'match'], keep='first').drop(columns='match')
# a b
# 0 1 Bulldogs at Aztecs
# 2 1 Bearcats at Huskies
# 4 1 something else