How do I do a SQL-like window function in Pandas?

Question

I have data that looks like the following:

org_id	org_name	person_id	date
111	Microsoft	453241	1/1/05
222	Zebra	21341	6/1/95
333	Company	42343241	1/1/23
111	Microsoft	098678	2/1/13
111	Microsoft Inc	6786	6/1/23
222	Zebra	546	4/1/06
333	Company	vcxv313	2/1/23
222	NewZebra	876	4/1/23
333	Company	432gf	4/1/23

And I want to run Pandas functions similar to this type of SQL query:

SELECT org_id, org_name
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY org_id ORDER BY date DESC) as row_num,
org_id, org_name
FROM dataframe
)
WHERE row_num = 1

result set should be:

org_id	org_name
111	Microsoft Inc
222	NewZebra
333	Company

I'm finding myself having trouble with the Pandas groupby syntax and aggregate functions. Any help would be appreciated

Does this answer your question? [SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe](https://stackoverflow.com/questions/17775935/sql-like-window-functions-in-pandas-row-numbering-in-python-pandas-dataframe) — Ankur Sinha, Jul 27 '23 at 12:54

score 0 · Answer 1 · answered Jul 27 '23 at 13:00

In pandas we can sort the dataframe by date in desc order then drop the duplicates

df['date_'] = pd.to_datetime(df['date'], format='%d/%m/%y')
result = (
    df
    .sort_values('date_', ascending=False)
    .drop_duplicates(subset=['org_id'])
    .reindex(columns=['org_id', 'org_name'])
)

   org_id       org_name
4     111  Microsoft Inc
7     222       NewZebra
8     333        Company

How do I do a SQL-like window function in Pandas?

1 Answers1