0

I want to find duplicate items within 2 rows in Excel. So for example my Excel consists of:

    list_A  list_B
0   ideal   ideal
1   brown   colour
2   blue    blew
3   red     red

I checked the pandas documentation and tried duplicate method but I simply don't know why it keeps saying "DataFrame is empty". It finds both columns and I guess it's iterated over it but why doesn't it find the values and compare them?

I also tried using iterrows but honestly don't know how to implement it.

When running the code I get this output:

Empty DataFrame

Columns: [list A, list B]

Index: []

import pandas as pd

pt = pd.read_excel(r"C:\Users\S531\Desktop\pt.xlsx")
dfObj = pd.DataFrame(pt)
doubles = dfObj[dfObj.duplicated()]      
print(doubles)

The output I'm looking for is:

    list_A  list_B
0   ideal   ideal
3   red     red

Final solved code looks like this:

import pandas as pd

pt = pd.read_excel(r"C:\Users\S531\Desktop\pt.xlsx")
doubles = pt[pt['list_A'] == pt['list_B']]
print(doubles)
rpanai
  • 12,515
  • 2
  • 42
  • 64
Wondarar
  • 171
  • 3
  • 5
  • 19
  • `pt` should already be a `DataFrame` object, so no reason for `dfObj = pd.DataFrame(pt)` ... what do you get when you print `pt`? – Itamar Mushkin Sep 01 '19 at 08:28
  • when I add print(pt) I get: list A list B 0 ideal ideal 1 brown colour 2 blue blew 3 red red – Wondarar Sep 01 '19 at 08:32
  • Okay, now we know your example input is read properly. I've suggested an edit, so it's easy to see the dataframe and copy it. Please visit https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Itamar Mushkin Sep 01 '19 at 09:04

1 Answers1

0

The term "duplicate" is usually used to mean rows that are exact duplicates of previous rows (see the documentation of pd.DataFrame.duplicate).

What you are looking for is just the rows where these two columns are equal. For that, you want:

doubles = pt[pt['list_A'] == pt['list_B']]

Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
  • thanks your really fast replies. Yes, correct, I haven´t been precise enough. what does the final code look like? do I have to assign it to my "doubles" variable? and finally just print(doubles). I tried it but got a bunch of traceback messages... – Wondarar Sep 01 '19 at 09:20
  • yes, that would work. try `doubles = pt[pt['list_A'] == pt['list_B']]` and printing `doubles`. – Itamar Mushkin Sep 01 '19 at 09:25
  • If you're getting a bunch of traceback messages, it might be that I made an error in my answer - what error messages are you getting? – Itamar Mushkin Sep 01 '19 at 09:26
  • solved, thank you very much. I got an error message because in your edit you added an underscore to list_A. I didn´t see this detail immediately. And I changed "df" to "pt" since my variable is called pt – Wondarar Sep 02 '19 at 18:14
  • You're welcome. I've edited from `df` to `pt` for clarity. If this answer answers your question, please mark it as accepted (with the little "v" sign), so that others will know it is answered. – Itamar Mushkin Sep 03 '19 at 06:43