Sorting vs iterative querying question in Pandas

Question

I recently picked up a project a little out of my comfort zone and I'm not sure how to approach part of it. This may be a duplicate, but I haven't been able to find any solid answers in my searching. I've worked in other languages, but am new to python/pandas which is what I'm being requested to do this in.

My end goal is an automated script to send out an "In Approval" table to managers of all active orders waiting to be approved. The part I'm having issues with is tackling the actual data. I import the data from a CSV, and my initial plan was to break different columns of the CSV into 2 dataframes. 1 with the active approvals and 1 with all the order data. I was then going to iterate through the Approvals based on a unique ID and run it through a class that queries and returns orders matching that ID. Then I found out that Pandas doesn't iterate like I'm used to in other languages, where I'd just run it through a ForEach.

So my question would be, is there a best known method to iterate through and query data like this, or is there some Pandas magic I'm missing that will allow me to sort and pull out data I can format into an HTML table for presentation?

EDIT Here is a simplified and bleached version of the data I'm working with and what I'm trying to turn it into. I did this as a table for ease of readability.

Approval_Id	Approval_Status	Approver_Status	Approver_Type	Approver_Name	Receiver	Total_Cost	Product
1138	ACTIVE	Approved	Manager	Krabs, Eugene	SquarePants, SpongeBob	26375	Network Gear
1138	ACTIVE	Approved	Manager	Krabs, Eugene	SquarePants, SpongeBob	26375	PC Gear
1138	ACTIVE	Awaiting Approval	Finance	Hira, Jeffery	SquarePants, SpongeBob	NA	Network Gear
1138	ACTIVE	Awaiting Approval	Finance	Hira, Jeffery	SquarePants, SpongeBob	NA	PC Gear
1138	ACTIVE	To be approved	Signature Authority	Pennyworth, Alfred	SquarePants, SpongeBob	NA	Network Gear
1138	ACTIVE	To be approved	Signature Authority	Pennyworth, Alfred	SquarePants, SpongeBob	NA	PC Gear
1138	ACTIVE	To be approved	Signature Authority	Pines, Stan	SquarePants, SpongeBob	NA	Network Gear
1138	ACTIVE	To be approved	Signature Authority	Pines, Stan	SquarePants, SpongeBob	NA	PC Gear
6585	APPROVED	Approved	Finance	Hira, Jeffery	Omashu, Bumi	NA	Network Gear
6585	APPROVED	Approved	Finance	Hira, Jeffery	Omashu, Bumi	NA	PC Gear
6585	APPROVED	Approved	Finance	Hira, Jeffery	Omashu, Bumi	NA	Other
6585	APPROVED	Approved	Manager	Kuei, Earth King	Omashu, Bumi	194485	Network Gear
6585	APPROVED	Approved	Manager	Kuei, Earth King	Omashu, Bumi	194485	PC Gear
6585	APPROVED	Approved	Manager	Kuei, Earth King	Omashu, Bumi	194485	Other
6585	APPROVED	Approved	Signature Authority	Pennyworth, Alfred	Omashu, Bumi	NA	Network Gear
6585	APPROVED	Approved	Signature Authority	Pennyworth, Alfred	Omashu, Bumi	NA	PC Gear
6585	APPROVED	Approved	Signature Authority	Pennyworth, Alfred	Omashu, Bumi	NA	Other
6585	APPROVED	Approved	Signature Authority	Pines, Stan	Omashu, Bumi	NA	Network Gear
6585	APPROVED	Approved	Signature Authority	Pines, Stan	Omashu, Bumi	NA	PC Gear
6585	APPROVED	Approved	Signature Authority	Pines, Stan	Omashu, Bumi	NA	Other

I'm looking to return the ACTIVE rows under Approval_Status and get rid of the duplicate entries out of the Approver columns, while grabbing only a single copy of the number out of the Total_Cost. This is what I want to end state to look like:

Approval_Id	Approver_Status	Approver_Type	Approver_Name	Receiver	Total_Cost
1138	Approved	Manager	Krabs, Eugene	SquarePants, SpongeBob	26375
1138	Awaiting Approval	Finance	Hira, Jeffery	SquarePants, SpongeBob	26375
1138	To be approved	Signature Authority	Pennyworth, Alfred	SquarePants, SpongeBob	26375
1138	To be approved	Signature Authority	Pines, Stan	SquarePants, SpongeBob	26375

You can iterate over a column of a dataframe with standard python for x in y. You can also iterate over the entire dataframe with iterrows or itertuples. — Eric Truett, Apr 19 '21 at 23:15
Python's `for` is analogous to `foreach` in other languages so if you're familiar with it then just iterate using for, although I would recommend looking at some vectorized functions included with Pandas, things like `groupby` might be useful in your project. Can't say more without seeing what your data looks like. — NotAName, Apr 19 '21 at 23:28
You need to provide more details about your concrete problem. A [mcve] would be ideal. Consult [the following quesiton](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for how to make reproducible pandas examples — juanpa.arrivillaga, Apr 20 '21 at 00:02

score 1 · Accepted Answer · answered Apr 20 '21 at 01:56

1

This will do what you ask, split into a filter, colun selection and duplicate dropping operations for clarity:

df = df.loc[df.Approval_Status == "ACTIVE"]
df = df["Approval_Id", "Approver_Status", "Approver_Type", "Approver_Name", "Receiver", "Total_Cost"]
df = df.drop_duplicates()

answered Apr 20 '21 at 01:56

anon01

10,618
8
35
58

This is great. Thank you! – TheMean Won Apr 20 '21 at 18:11

Sorting vs iterative querying question in Pandas

1 Answers1