Find all row entries with the same entry in another column using pandas

Asked Apr 28 '20 at 10:12

Active Apr 28 '20 at 10:27

Viewed 28 times

Currently I have a very large csv that roughly has a structure like the below.

Id   label   attribute     link
1  usernamea   500    https://someurl.com
2  usernameb   422    https://someurl.com
3  usernamec   4422   https://anotherurl.com

I'm trying to find a way using Python and pandas to selecting all the instances where two people have shared the same link and then writing that into a new csv by their Id. Worth mentioning I created the Id column as an index when using df.to_csv to create this csv.

So for example going off the instance above the output I need would be:

source  target 
   1      2

Since usernamea and usernameb both shared https://someurl.com.

Any thoughts on this?

edited Apr 28 '20 at 10:27

Tonechas

13,398
16
46
80

asked Apr 28 '20 at 10:12

osint_alex

does it always comes in pair? is there a chance there are more than two users with the same link? – dzakyputra Apr 28 '20 at 10:21
There are often more than two users with the same link and I need all those instances. I should say it doesn't matter what order the 1,2 are in the next csv - i.e. it isn't a problem if the output is 2,1. – osint_alex Apr 28 '20 at 10:35
does this answer your question? https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby – dzakyputra Apr 28 '20 at 10:44
Yes it did thank you! df.groupby('link)['label'].unique() solved this for me – osint_alex Apr 28 '20 at 17:44

Find all row entries with the same entry in another column using pandas

0 Answers0