I got a csv in the following fashion with 120000 rows:
ID Duplicate
1 65
2 67
4 12
4 53
4 101
12 4
12 53
101 ...
This list basically specifies a number of user ids, and users which are duplicates of that user. How the list is made up now I cant really filter this out in Excel, therefore I am trying to transform the list with this outcome:
[1, 65]
[2, 67]
[4, 12, 53, 101]
Afterwards I would be able to write into a new csv deleting only list[0] for each element, so that I can retain one user per "duplicate user block". In the Excel I would then delete all remaining user IDs.
However to come to this point I got a few problems:
import csv
with open("contacts.csv", "rt") as f:
reader = csv.reader(f, delimiter="\t")
contacts = []
for row in reader:
if row[0] not in contacts:
contacts.append(row[0])
if row[1] not in contacts:
position = contacts.index(row[0])
contacts[position].append(row[1])
Of course I get the error "AttributeError: 'str' object has no attribute 'append'" as contacts[position] is a string. But how can I change the code, so that I get a list for each block of duplicate contacts?
Thanks!