0

Currently I have a very large csv that roughly has a structure like the below.

Id   label   attribute     link
1  usernamea   500    https://someurl.com
2  usernameb   422    https://someurl.com
3  usernamec   4422   https://anotherurl.com

I'm trying to find a way using Python and pandas to selecting all the instances where two people have shared the same link and then writing that into a new csv by their Id. Worth mentioning I created the Id column as an index when using df.to_csv to create this csv.

So for example going off the instance above the output I need would be:

source  target 
   1      2 

Since usernamea and usernameb both shared https://someurl.com.

Any thoughts on this?

Tonechas
  • 13,398
  • 16
  • 46
  • 80
osint_alex
  • 952
  • 3
  • 16
  • does it always comes in pair? is there a chance there are more than two users with the same link? – dzakyputra Apr 28 '20 at 10:21
  • There are often more than two users with the same link and I need all those instances. I should say it doesn't matter what order the 1,2 are in the next csv - i.e. it isn't a problem if the output is 2,1. – osint_alex Apr 28 '20 at 10:35
  • does this answer your question? https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby – dzakyputra Apr 28 '20 at 10:44
  • Yes it did thank you! df.groupby('link)['label'].unique() solved this for me – osint_alex Apr 28 '20 at 17:44

0 Answers0