I have data in a database table and I'm exporting this to a file like this and there are around 100k records (this is kind of duplicate based on id)
id | dp_1 | pp_1 | Phone |
-------|---------|-------|--------|
1 | dp1 | | phone1 |
----------------------------------|
1 | | pp1 | phone1 |
----------------------------------|
2 | dp2 | pp2 | phone2 |
------------------------------------
2 | | | phone4 |
-----------------------------------
3 | dp3 | pp3 | phone3 |
------------------------------------
3 | dp3 | | phone3 |
-----------------------------------
4 | | pp4 | |
------------------------------------
4 | dp4 | | |
I wanted the result to be as below:
id | dp_1 | pp_1 | Phone |
-------|---------|-------|-----------------|
1 | dp1 | pp1 | phone1 - phone1 |
-------------------------------------------|
2 | dp2 | pp2 | phone2 - phone4 |
-------------------------------------------|
3 | dp3 | pp3 | phone3 |
-------------------------------------------|
4 | dp4 | pp4 | |
--------------------------------------------
I have written this SQL:
WITH cte AS (
SELECT*,
row_number() OVER(PARTITION BY id,DP_1, PP_1, phone ORDER BY id desc)
AS [rn]
FROM table1
)
Select * into #temp from cte WHERE [rn] = 1 ORDER BY id
How can I achieve this in Python OR using SQL query? I'm using Anaconda.