-1

I recently picked up a project a little out of my comfort zone and I'm not sure how to approach part of it. This may be a duplicate, but I haven't been able to find any solid answers in my searching. I've worked in other languages, but am new to python/pandas which is what I'm being requested to do this in.

My end goal is an automated script to send out an "In Approval" table to managers of all active orders waiting to be approved. The part I'm having issues with is tackling the actual data. I import the data from a CSV, and my initial plan was to break different columns of the CSV into 2 dataframes. 1 with the active approvals and 1 with all the order data. I was then going to iterate through the Approvals based on a unique ID and run it through a class that queries and returns orders matching that ID. Then I found out that Pandas doesn't iterate like I'm used to in other languages, where I'd just run it through a ForEach.

So my question would be, is there a best known method to iterate through and query data like this, or is there some Pandas magic I'm missing that will allow me to sort and pull out data I can format into an HTML table for presentation?

EDIT Here is a simplified and bleached version of the data I'm working with and what I'm trying to turn it into. I did this as a table for ease of readability.

Approval_Id Approval_Status Approver_Status Approver_Type Approver_Name Receiver Total_Cost Product
1138 ACTIVE Approved Manager Krabs, Eugene SquarePants, SpongeBob 26375 Network Gear
1138 ACTIVE Approved Manager Krabs, Eugene SquarePants, SpongeBob 26375 PC Gear
1138 ACTIVE Awaiting Approval Finance Hira, Jeffery SquarePants, SpongeBob NA Network Gear
1138 ACTIVE Awaiting Approval Finance Hira, Jeffery SquarePants, SpongeBob NA PC Gear
1138 ACTIVE To be approved Signature Authority Pennyworth, Alfred SquarePants, SpongeBob NA Network Gear
1138 ACTIVE To be approved Signature Authority Pennyworth, Alfred SquarePants, SpongeBob NA PC Gear
1138 ACTIVE To be approved Signature Authority Pines, Stan SquarePants, SpongeBob NA Network Gear
1138 ACTIVE To be approved Signature Authority Pines, Stan SquarePants, SpongeBob NA PC Gear
6585 APPROVED Approved Finance Hira, Jeffery Omashu, Bumi NA Network Gear
6585 APPROVED Approved Finance Hira, Jeffery Omashu, Bumi NA PC Gear
6585 APPROVED Approved Finance Hira, Jeffery Omashu, Bumi NA Other
6585 APPROVED Approved Manager Kuei, Earth King Omashu, Bumi 194485 Network Gear
6585 APPROVED Approved Manager Kuei, Earth King Omashu, Bumi 194485 PC Gear
6585 APPROVED Approved Manager Kuei, Earth King Omashu, Bumi 194485 Other
6585 APPROVED Approved Signature Authority Pennyworth, Alfred Omashu, Bumi NA Network Gear
6585 APPROVED Approved Signature Authority Pennyworth, Alfred Omashu, Bumi NA PC Gear
6585 APPROVED Approved Signature Authority Pennyworth, Alfred Omashu, Bumi NA Other
6585 APPROVED Approved Signature Authority Pines, Stan Omashu, Bumi NA Network Gear
6585 APPROVED Approved Signature Authority Pines, Stan Omashu, Bumi NA PC Gear
6585 APPROVED Approved Signature Authority Pines, Stan Omashu, Bumi NA Other

I'm looking to return the ACTIVE rows under Approval_Status and get rid of the duplicate entries out of the Approver columns, while grabbing only a single copy of the number out of the Total_Cost. This is what I want to end state to look like:

Approval_Id Approver_Status Approver_Type Approver_Name Receiver Total_Cost
1138 Approved Manager Krabs, Eugene SquarePants, SpongeBob 26375
1138 Awaiting Approval Finance Hira, Jeffery SquarePants, SpongeBob 26375
1138 To be approved Signature Authority Pennyworth, Alfred SquarePants, SpongeBob 26375
1138 To be approved Signature Authority Pines, Stan SquarePants, SpongeBob 26375
  • You can iterate over a column of a dataframe with standard python for x in y. You can also iterate over the entire dataframe with iterrows or itertuples. – Eric Truett Apr 19 '21 at 23:15
  • Python's `for` is analogous to `foreach` in other languages so if you're familiar with it then just iterate using for, although I would recommend looking at some vectorized functions included with Pandas, things like `groupby` might be useful in your project. Can't say more without seeing what your data looks like. – NotAName Apr 19 '21 at 23:28
  • You need to provide more details about your concrete problem. A [mcve] would be ideal. Consult [the following quesiton](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for how to make reproducible pandas examples – juanpa.arrivillaga Apr 20 '21 at 00:02

1 Answers1

1

This will do what you ask, split into a filter, colun selection and duplicate dropping operations for clarity:

df = df.loc[df.Approval_Status == "ACTIVE"]
df = df["Approval_Id", "Approver_Status", "Approver_Type", "Approver_Name", "Receiver", "Total_Cost"]
df = df.drop_duplicates()
anon01
  • 10,618
  • 8
  • 35
  • 58