1

My final objective is to create a column called 'Status' that indicates if active or cancelled based on the name of a sheet. I need it to check if the sheetname start with the word 'Full Member List'. If so then Active, else the Status column should be Cancelled. How do I do this below? I only need help with the one line in this code where I have commented #need help with below line. I get an invalid syntax error for that line

My attempt-

import pandas as pd
import os
from openpyxl import load_workbook

cols_to_drop =  ['PSI ID','PSIvet Region','PSIvet region num','Fax','County','Ship state']              
column_name_update_map = {'Account name': 'Company Name','Billing address':'Address','Billing city':'City','Billing State':'State','Billing state':'State'} 

for file in os.listdir("C:/Users/hh/Desktop/autotranscribe/python/Matching"):
    if file.startswith("PSI"):
        dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
        output = dict()
        for ws, df in dfs.items():
            if any(ws.startswith(x) for x in ["New Members", "PVCC"]):
                continue  
                temp = df
                #need help with below line
                temp['Status'] = "Active" if any(ws.startwith(x) for x in == "Full Member List" else "Cancelled" )   
            #drop unneeded columns
            temp = df.drop(cols_to_drop, errors="ignore", axis=1)
            #rename columns
            temp = temp.rename(columns=column_name_update_map)
            #drop empty columns
            temp = temp.dropna(how="all", axis=1)
            temp['Partner'] = "PSI"
            output[ws] = temp
        writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()
  • 2
    I noticed that people downvote other peoples questions very frequently on this site, because it costs no reputation (whereas downvoting answers cost reputation). However, this is generally because new fellas do not put effort into writing questions. You should try to see your question from a third person's perspective. Readers of your question do not know many things about your problem (what exactly is the error, where in the code error is occurring, and other things). So your question should have enough details to provide an answer without them having to ask you more questions. – akaAbdullahMateen May 04 '22 at 21:59
  • Give me some time, I am reading your question, and will let you know what can we do about this. – akaAbdullahMateen May 04 '22 at 22:00
  • 2
    Also, one tip: **never** explain code or errors as _comments_ inside your code. Break these down into code snippets and explain them outside the code in your question with as much details as you can and as is needed. – akaAbdullahMateen May 04 '22 at 22:02
  • @akaAbdullahMateen thank you sir. I tried my best to provide the reprex. Please do let me know if theres anything I can do to ask this in a better way and I will improve. –  May 04 '22 at 22:03
  • Ok. so next time I will remove the rest of the code when posting and focus on the line that gives the error. that makes sense. Thank you –  May 04 '22 at 22:04
  • What do you mean by `ws.startswith(x) for x in == "Full Member List"`? – akaAbdullahMateen May 04 '22 at 22:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/244483/discussion-between-akaabdullahmateen-and-joe-tha). – akaAbdullahMateen May 04 '22 at 22:08
  • What is the error? When including code in a question, it’s best to either construct it such that other people can run it (e.g., you’re referencing your own C:/ drive which won’t work on someone else’s machine), or give very granular detail about the error (e.g., this line causes this error) – K. Thorspear May 04 '22 at 22:09
  • @akaAbdullahMateen What i mean is I want to check if the sheetname starts with the words full member list. if it does then I want to set temp['Status'] to active else 'cancelled' –  May 04 '22 at 22:09
  • @akaAbdullahMateen im able to do it like this temp['Status'] = "Active" if ws == "Full Member List" else "Cancelled". However, i want the code to pick the sheetname that starts with full member list because the sheet name changes every month. this month it was full member list 04.05. Nect month it will be full member list 04.06. This is why i just want the first 2 words 'full member list' to be picked up by the if statement –  May 04 '22 at 22:12
  • 1
    So basically, you want to check if the sheetsname has "Full Member List" in the start of it. For example. sheetname: "Full Member List Alpha" should result in `temp["Status"] = "Active"`, whereas, sheetname "Beta" should result in `temp["Status"] = "Cancelled"`. – akaAbdullahMateen May 04 '22 at 22:13
  • @K.Thorspear its a syntax error. Thank you will keep it in mind in the future questions –  May 04 '22 at 22:14
  • @akaAbdullahMateen yes exactly –  May 04 '22 at 22:14
  • Wait a minute, I am writing the answer. Thanks for cooperating! – akaAbdullahMateen May 04 '22 at 22:15

2 Answers2

0

In case you want to check for full string "Full Member List" in start of your sheetname.

temp['Status'] = "Active" if ws.startswith("Full Member List") else "Cancelled"

To check, if either of words "Full", "Member", "List" appears in the sheetname:

for x in "Full Member List".split(" "):
    if ws.startswith(x):
        temp["Status"] = "Active"
        break

if temp["Status"] != "Active":
    temp["Status"] = "Cancelled"
  • Hi @aka thanks for your response. Unfortunately, The column 'status' is not populating after i swapped the line with the first solution you provided –  May 04 '22 at 22:28
  • Can you run the code and see what is the value of `ws` on each iteration? – akaAbdullahMateen May 04 '22 at 22:31
0

I think you need to:

  • fix the indenting in your code on the line reading temp = df
  • fix the typo "startwith" to be startswith
  • consider adding logic to ignore files containing (updated headers)
  • change the line you are asking about to this:
                temp['Status'] = "Active" if ws.startswith("Full Member List") else "Cancelled"

An updated version of your code looks like this:

import pandas as pd
import os
from openpyxl import load_workbook

cols_to_drop =  ['PSI ID','PSIvet Region','PSIvet region num','Fax','County','Ship state']              
column_name_update_map = {'Account name': 'Company Name','Billing address':'Address','Billing city':'City','Billing State':'State','Billing state':'State'} 

for file in os.listdir("."):
    if file.startswith("PSI") and "(updated headers)" not in file:
        dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
        output = dict()
        for ws, df in dfs.items():
            if any(ws.startswith(x) for x in ["New Members", "PVCC"]):
                continue  
            temp = df
            temp['Status'] = "Active" if ws.startswith("Full Member List") else "Cancelled"   
            #drop unneeded columns
            temp = df.drop(cols_to_drop, errors="ignore", axis=1)
            #rename columns
            temp = temp.rename(columns=column_name_update_map)
            #drop empty columns
            temp = temp.dropna(how="all", axis=1)
            temp['Partner'] = "PSI"
            output[ws] = temp
        writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()

To test this, I created an xlsx file named PSI 001.xlsx with a sheet named Full Member List 001 containing the following anchored at cell A1:

will skip
will skip
will skip
will skip
will skip
foo
1
2
3

The output was stored to a file named PSI 001 (updated headers).xlsx with a sheet named as above with the following contents anchored at cell A1:

foo Status  Partner
1   Active  PSI
2   Active  PSI
3   Active  PSI
constantstranger
  • 9,176
  • 2
  • 5
  • 19
  • This worked. thank you so much again constantstranger! –  May 04 '22 at 22:36
  • constantstranger one more subquestion. If i want to drop a column from one of the sheets(sheetname cancelled) which is at index number 7. will this work? temp = if ws.startswith("Cancelled"):temp.drop(temp.columns[[7]], axis=1) –  May 04 '22 at 23:48
  • @Joe Tha I think you just need this: `if ws.startswith("Cancelled"): temp = temp.drop(temp.columns[7], axis=1)`. This assumes that you have already initialized temp using `temp = df` in the inner loop of your code. – constantstranger May 05 '22 at 00:28
  • Here's a related SO answer on dropping columns by index (though I think you already have the basic idea): https://stackoverflow.com/a/20301769/18135454 – constantstranger May 05 '22 at 00:30
  • this is unfortunately dropping 2 columns instead of the one i want to drop. Its probably because at index 7 and 8. Both column names are the same which may be causing this error. Ive posted here in greater detail https://stackoverflow.com/questions/72120659/how-do-i-drop-a-specific-column-from-a-sheet-in-my-xls-file-in-python-at-index-l –  May 05 '22 at 00:34
  • Please check my answer to your new question. – constantstranger May 05 '22 at 01:03
  • constantranger.one thought i had was to do a if ws.startswith("Cancelled Members"): df.drop('Active date', axis=1). This would remove the column that says active date in my 'cancelled' sheet and so the problem of two columns with the same name 'status date' would go away. I put this right after for ws, df in dfs.items(): but it isnt working. Is there a better place within the code i should place this logic so that if the column 'Active date' exists in sheetname Cancelled, it is dropped initially itself? –  May 05 '22 at 02:06
  • If I'm understanding correctly, you would need to use `df = df.drop('Active date', axis=1)`. The `drop()` method won't change `df`, you need to assign its return value to something (in this case, df). – constantstranger May 05 '22 at 02:21
  • Yes I have assigned it to df itself. like if ws.startswith("Cancelled Members"): df= df.drop('Active date', axis=1) –  May 05 '22 at 02:29
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/244486/discussion-between-constantstranger-and-joe-tha). – constantstranger May 05 '22 at 02:32
  • https://stackoverflow.com/questions/72131668/how-do-i-get-datetime-based-on-files-in-my-directory-that-starts-with-a-certain –  May 05 '22 at 18:57