Currently I have a very large csv that roughly has a structure like the below.
Id label attribute link
1 usernamea 500 https://someurl.com
2 usernameb 422 https://someurl.com
3 usernamec 4422 https://anotherurl.com
I'm trying to find a way using Python and pandas to selecting all the instances where two people have shared the same link and then writing that into a new csv by their Id. Worth mentioning I created the Id column as an index when using df.to_csv
to create this csv.
So for example going off the instance above the output I need would be:
source target
1 2
Since usernamea
and usernameb
both shared https://someurl.com.
Any thoughts on this?