1

My current script is able to loop through a directory and make multi level changes for sheets through pandas. I want to add columns in my dataframe which are based off the sheet

There are 2 sheets in my final dataframe with same columns but one sheet is named 'Voting Members' and the other is named 'Removed Members'. I need a column added to the dataframe called 'Status' where for the voting members sheet under the Status column all rows say 'Active' and for the 'Removed Members' sheet all rows say "Cancelled" example-

original dataframe for sheetname 'Voting Members'

Zip FirstName   LastName  
60618   Amur    Shonan    
62960   Karla   Cunnungham 

Desired output

Zip FirstName   LastName  Status
60618   Amur    Shonan    Active
62960   Karla   Cunnungham Active

So basically, if there are 20 rows, there should be a status column having 20 'actives; for sheetname 'Voting Members; and 20 rows of 'cancelled' for the sheet 'Removed members'.

My code so far -

import pandas as pd
import os
column_name_update_map = {'LocationName': 'Company Name','StreetAddress':'Address','City':'City','State':'State'}  
cols_to_drop =  ['TypeDescription','LastName','FirstName','StreetAddress2']     

for file in os.listdir("C:/Users/hh/Desktop/oo/python/Partner Matching"):
    if file.startswith("TVC"):
        dfs = pd.read_excel(file, sheet_name=None)
        output = dict()
        for ws, df in dfs.items():
            if ws in ["Opt-Ins", "New Voting Members", "Temporary Members"]:
                continue
            #drop unneeded columns
            temp = df.drop(cols_to_drop, errors="ignore", axis=1)
            #rename columns
            temp = temp.rename(columns=column_name_update_map)
            #drop empty columns
            temp = temp.dropna(how="all", axis=1)
            output[ws] = temp
        writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()

How can I implement this within my code so as to get the status column for sheets within the file?

  • Does the original .xlsx file have a sheet named `Voting Members`? If so, do you want to modify that sheet in any way other than adding a `Status` column with all rows set t `Active`? Or does the original .xlsx file have no sheet named `Voting Members`, and you want to somehow create one? And are you saying that for `Removed Members` you want it to have identical data as `Voting Members` except that the values in the `Status` row are `Cancelled` instead of `Active`? – constantstranger May 03 '22 at 20:09
  • @constantstranger yes the original file has sheets named voting members and removed members. yes I want to do exactly what you have mentioned –  May 03 '22 at 20:11
  • I only wanto to add status column. nothing else. both sheets are in original file and dont need to be created. i just want to implement the new column addition as part of my code that i pasted in my question @constantstranger –  May 03 '22 at 20:23

1 Answers1

0

Within your inner loop, try something like this:

            if ws in ["Voting Members", "Removed Members"]:
                temp = df
                temp['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
                temp = temp[["Zip", "FirstName", "LastName", "Status"]]
                output[ws] = temp

To add another column called Partner in each of the above sheets with the value TVC in all rows, you can add the following line:

                temp['Partner'] = "TVC"

You can put this just after the line temp = df or just before the line output[ws] = temp.

constantstranger
  • 9,176
  • 2
  • 5
  • 19
  • works perfectly. Thank you @constantstranger. I have one more similar sub question. i can create a new question if you dont think you can answer it here. I want another column called Partner under which I want all rows for both sheets to be called "TVC". I get TVC from this part in my script - if file.startswith("TVC"): –  May 03 '22 at 20:44
  • 1
    @Joe Tha Please take a look at my updated answer, which hopefully addresses your question regarding `TVC`. – constantstranger May 03 '22 at 22:54
  • Thank you @constantstranger. If you feel my question would help the SO community in the future please upvote it. It would help a budding py dev like me. thanks –  May 04 '22 at 02:25
  • I have a question about this code. instead if ws == "Voting Members" else "Cancelled". How can I do if ws.startswith = "voting members" then active else cancelled? I asked this question here too https://stackoverflow.com/questions/72119777/how-do-i-create-an-if-statement-that-checks-for-sheetnames-that-startwith-a-cert –  May 04 '22 at 22:01
  • @Joe Tha Please see my answer to your new question. – constantstranger May 04 '22 at 22:31
  • hey @constantstranger ive asked a question on shutil for this code here https://stackoverflow.com/questions/72149034/how-do-i-shutil-move-my-file-post-processing-in-python-based-on-filename no one has given a canonical answer yet. please do chime in if you would like to –  May 07 '22 at 05:35
  • @Joe Tha I have given an answer to that question. – constantstranger May 07 '22 at 12:50