1

Im using SQL server in pyodbc and python. I want to create a list of lists, dictionary, or panda df etc. of unique ID's of rows with duplicate values in multiple columns. For ex. i have a table like this:

 ID          size      page       rate
12345         6         12         20  
67890         6         12         20
23456         4         10         15
87654         4         10         15
43210         4         10         15
....

Columns size, page, and rate are duplicates for rows 1-2, and 3-5. So I need to group the ID's together like so: (List of lists for example:)

duplicates = [[12345, 67890], [23456, 87654, 43210],...] 

After my cursor executes and i get the first table as results:

duplicates =[]
row1 = [row[1] for row in cursor]
row2 = [row[1] for row in cursor]
counter = 1
index = 0
for row in cursor:
   if index <= len(row1)-2:
    n0 = row1[index]
    n1 = row2[index]
    n2 = row1[index+1]
    n3 = row2[index+1]
    if n0 == n1 and n2 == n3:
        duplicates.append(row[0])
    else:
        counter+=1
    index+=1
else: break 

Not working but any help and guidance will be appreciated! thanks!

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Refer this post to read from [`SQL`](https://stackoverflow.com/a/14487936/4985099) followed by, ``df.groupby(['size','page','rate'])['ID'].apply(list).tolist()`` – sushanth Jul 25 '20 at 09:29

1 Answers1

0

You can convert the query result into a pandas dataframe (let's say name 'df'). After which you can extract the duplicates id using following line.

df1 = df.groupby('size')['ID'].apply(list).reset_index(name='duplicates')

Samiie
  • 124
  • 4