0

I have data that looks like the following:

org_id org_name person_id date
111 Microsoft 453241 1/1/05
222 Zebra 21341 6/1/95
333 Company 42343241 1/1/23
111 Microsoft 098678 2/1/13
111 Microsoft Inc 6786 6/1/23
222 Zebra 546 4/1/06
333 Company vcxv313 2/1/23
222 NewZebra 876 4/1/23
333 Company 432gf 4/1/23

And I want to run Pandas functions similar to this type of SQL query:

SELECT org_id, org_name
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY org_id ORDER BY date DESC) as row_num,
org_id, org_name
FROM dataframe
)
WHERE row_num = 1

result set should be:

org_id org_name
111 Microsoft Inc
222 NewZebra
333 Company

I'm finding myself having trouble with the Pandas groupby syntax and aggregate functions. Any help would be appreciated

simplycoding
  • 2,770
  • 9
  • 46
  • 91
  • 1
    Does this answer your question? [SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe](https://stackoverflow.com/questions/17775935/sql-like-window-functions-in-pandas-row-numbering-in-python-pandas-dataframe) – Ankur Sinha Jul 27 '23 at 12:54

1 Answers1

0

In pandas we can sort the dataframe by date in desc order then drop the duplicates

df['date_'] = pd.to_datetime(df['date'], format='%d/%m/%y')
result = (
    df
    .sort_values('date_', ascending=False)
    .drop_duplicates(subset=['org_id'])
    .reindex(columns=['org_id', 'org_name'])
)

   org_id       org_name
4     111  Microsoft Inc
7     222       NewZebra
8     333        Company
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53