0

dataFrame:

ProName ProCat  Price
EJBR45  EL  5432.00
XYCK23  MH  656.00
RMX57   EL  8787.00
FG567   CO  56548.00
GHK245  EC  56456.00
EJBR45  EL  6665.00
XYCK23  MH  6576.00
RMX57   EL  15465.00
FG567   CO  78887.00
GHK245  EC  54654.00
EJBR45  EL  43556.00
XYCK23  MH  98445.00
FG567   CO  65436.00
GHK245  EC  654365.00

In SQL I use below query:

select ProName, ProCat, max(Price) as Price 
from Dtatatatata
group by ProName,ProCat

Results:

ProName ProCat  Price
FG567   CO  78887.00
GHK245  EC  654365.00
EJBR45  EL  6665.00
RMX57   EL  8787.00
XYCK23  MH  98445.00

Can we do this in python?

I have tried Python : Getting the Row which has the max value in groups using groupby but did not understand. Please guide

Applied in 1.5Million records dataframe, Performance is too slow:

distData = dataAll.set_index(['Donor', 'Recipient', 'Commodity Aggregation Type', 'Aid Category', 'Measure', 'Unit', 'Frequency', 'Date']).max(level=[0,1,2,3,4,5,6,7]).reset_index()
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Learnings
  • 2,780
  • 9
  • 35
  • 55
  • 2
    (EDIT) `df.set_index(['ProName', 'ProCat']).max(level=[0, 1])` – cs95 Dec 22 '17 at 06:43
  • @cᴏʟᴅsᴘᴇᴇᴅ, thanks working fine. – Learnings Dec 22 '17 at 06:46
  • @cᴏʟᴅsᴘᴇᴇᴅ, I stored result of above script in new dataframe "df2", then list(df2) gives ['Price'] only, how to access other 2 columns: ProName & ProCat – Learnings Dec 22 '17 at 06:50
  • Reset the index: `df = df.set_index(['ProName', 'ProCat']).max(level=[0, 1]).reset_index()` – cs95 Dec 22 '17 at 06:50
  • @cᴏʟᴅsᴘᴇᴇᴅ, thanks a lot... now fine. – Learnings Dec 22 '17 at 06:52
  • @cᴏʟᴅsᴘᴇᴇᴅ, Pls let me know what is level=[0, 1], is this position of first 2 columns, I have another dataframe with 9 columns, last one Value, So I should write level=[0,1,2,3,4,5,6,7] – Learnings Dec 22 '17 at 07:23
  • @cᴏʟᴅsᴘᴇᴇᴅ, I think speed of execution is slow, I have applied this to 1.5Million records, performance too slow, script updated in question, please check. – Learnings Dec 22 '17 at 07:25
  • 1
    Try this: dataAll['ValueMax'] = dataAll.groupby(['Donor', 'Recipient', 'Commodity Aggregation Type', 'Aid Category', 'Measure', 'Unit', 'Frequency', 'Date'])['Value'].transform(max) – Plinus Dec 22 '17 at 12:25
  • @Plinus, thanks a lot, this is working fine. – Learnings Dec 22 '17 at 12:27

0 Answers0