List unique ID's with duplicate values in other columns

Question

Im using SQL server in pyodbc and python. I want to create a list of lists, dictionary, or panda df etc. of unique ID's of rows with duplicate values in multiple columns. For ex. i have a table like this:

 ID          size      page       rate
12345         6         12         20  
67890         6         12         20
23456         4         10         15
87654         4         10         15
43210         4         10         15
....

Columns size, page, and rate are duplicates for rows 1-2, and 3-5. So I need to group the ID's together like so: (List of lists for example:)

duplicates = [[12345, 67890], [23456, 87654, 43210],...]

After my cursor executes and i get the first table as results:

duplicates =[]
row1 = [row[1] for row in cursor]
row2 = [row[1] for row in cursor]
counter = 1
index = 0
for row in cursor:
   if index <= len(row1)-2:
    n0 = row1[index]
    n1 = row2[index]
    n2 = row1[index+1]
    n3 = row2[index+1]
    if n0 == n1 and n2 == n3:
        duplicates.append(row[0])
    else:
        counter+=1
    index+=1
else: break

Not working but any help and guidance will be appreciated! thanks!

Refer this post to read from [`SQL`](https://stackoverflow.com/a/14487936/4985099) followed by, ``df.groupby(['size','page','rate'])['ID'].apply(list).tolist()`` — sushanth, Jul 25 '20 at 09:29

score 0 · Answer 1 · answered Jul 25 '20 at 09:39

0

You can convert the query result into a pandas dataframe (let's say name 'df'). After which you can extract the duplicates id using following line.

df1 = df.groupby('size')['ID'].apply(list).reset_index(name='duplicates')

answered Jul 25 '20 at 09:39

Samiie

124
4

List unique ID's with duplicate values in other columns

1 Answers1