0

I’m trying to translate a window-function from SQL to Pandas, which is only applied under the condition, that a match is possible – otherwise a NULL (None) value is inserted.

SQL-Code (example)

SELECT 
    [ID_customer]
    [cTimestamp]
    [TMP_Latest_request].[ID_req] AS [ID of Latest request]
FROM [table].[Customer] AS [Customer]
LEFT JOIN (
    SELECT * FROM(
        SELECT [ID_req], [ID_customer], [rTimestamp], 
        RANK() OVER(PARTITION BY ID_customer ORDER BY rTimestamp DESC) as rnk
        FROM [table].[Customer_request]
    ) AS [Q]
    WHERE rnk = 1
) AS [TMP_Latest_request]
ON [Customer].[ID_customer] = [TMP_Latest_request].[ID_customer]

Example

Joining the ID of the latest customer request (if exists) to the customer.

table:Customer
+-------------+------------+
| ID_customer | cTimestamp |
+-------------+------------+
|           1 |       2014 |
|           2 |       2014 |
|           3 |       2015 |
+-------------+------------+

table: Customer_request
+--------+-------------+------------+
| ID_req | ID_customer | rTimestamp |
+--------+-------------+------------+
|      1 |           1 |       2012 |
|      2 |           1 |       2013 |
|      3 |           1 |       2014 |
|      4 |           2 |       2014 |
+--------+-------------+------------+

Result: table:merged
+-------------+------------+----------------------+
| ID_customer | cTimestamp | ID of Latest request |
+-------------+------------+----------------------+
|           1 |       2014 | 3                    |
|           2 |       2014 | 4                    |
|           3 |       2015 | None/NULL            |
+-------------+------------+----------------------+

What is the equivalent in Python Pandas?

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
Nex
  • 421
  • 1
  • 4
  • 17

1 Answers1

0

Instead of using RANK() function, you can simply using the below, and it is easy to convert.

SELECT A.ID_Customer,A.cTimeStamp,B.ID_req
FROM Customer A
LEFT JOIN (
    SELECT ID_Customer,MAX(ID_req)ID_req 
    FROM Customer_request
    GROUP BY ID_Customer
   )B
ON A.ID_Customer = B.ID_Customer

Try the following query, if you are facing any issues, ask me in the comments.

DineshDB
  • 5,998
  • 7
  • 33
  • 49
  • Thanks for the suggestion. This will work with this simple example, but won't work with more sophisticated utilizations of the RANK. For example, if I have a list of certain rank values I want to match on, except of just the MAX value. – Nex Nov 06 '19 at 06:25
  • @Nex, Try this link https://stackoverflow.com/questions/17775935/sql-like-window-functions-in-pandas-row-numbering-in-python-pandas-dataframe – DineshDB Nov 06 '19 at 07:04