Python - issue with removing fields from the output

Question

I have an issue with my code. I need the script to remove fields which fill all three conditions:

the CreatedBy is koala, Book is PI or SI or II or OT or FG, and the Category **is ** Cert or CertPlus or Cap or Downside.

Currently my code removes all koala and all books and only takes the last argument. So for example my current output leaves fields only if the category is different. I would like it to show fields ONLY if all 3 arguments are met and not if koala or book = PI or SI or II or OT or FG and to show everything else which is in range. If field is created by koala and category is Cert I wish to see this field but now it is removed. Or if none of the arguments are met I also want to see those fields ( e.g. createdby is Extra, Book is NG and Category is Multiple. Now those are also removed from the output.

Example dataset:
In the link below - I wish to remove only those marked red:


current_path = os.path.dirname(os.path.realpath(sys.argv[0]))

a_path, q_path = 0, 0

def assign_path(current_path, a_path = 0, q_path = 0):
    
    files = os.listdir(current_path)
    
    for i in files:
        if re.search('(?i)activity',i):
            a_path = '\\'.join([current_path,i])
        elif re.search('(?i)query',i):
            q_path = '\\'.join([current_path,i])
    
    return a_path, q_path
        
             
a_path, q_path = assign_path(current_path)  


if a_path == 0 or q_path == 0:
    
    files = os.listdir(current_path)
    directories = []
    
    for i in files:
        if os.path.isdir(i): directories.append(i)
               
    for i in directories:
        if re.search('(?i)input',i):
            a_path, q_path = assign_path('\\'.join([current_path,i]), a_path, q_path) 

L = list(range(len(qr)))
L1 = list(range(len(qr2)))
L2 = list(range(len(ac)))

-------------------------------------------------------

qr = pd.read_excel(q_path)
qr2 = pd.read_excel(q_path)

qr_rec = qr2.iloc[[0,1]]
d = qr2.iloc[0].to_dict()
for i in list(d.keys()): d[i] = np.nan

for i in range(len(qr2)):
     if qr2.iloc[i]['LinkageLinkType'] != 'B2B_COUNTER_TRADE'\
        and qr2.iloc[i]['CreatedBy'] == 'koala_'\
            and qr2.iloc[i]['Book'] in {'PI','SI','II','OT','FG'}\
            and qr2.iloc[i]['Category'] not in {'Cert','CertPlus','Cap','Downside'}:
         while i in L: L.remove(i)  
         if qr2.iloc[i]['PrimaryRiskId'] not in list(aID):
                 qr_rec = qr_rec.append(qr2.iloc[i],ignore_index=True)

I have added the beggining of the code which allows me to use the Excel file. I have two files, one of them being a_path ( please disregard this one). The issue I have is on the q_path.

Sure, added the link to the example dataset, I wish to remove those marked red only — Paula Ciesielska, Jul 19 '22 at 14:28
I don't think your description of what to remove and the red items matches. Your description says you don't want anything in the "Cap" category, but those are red. — FCo, Jul 19 '22 at 14:51
This looks like a Pandas DataFrame, is that correct? It would help to add the Pandas tag, and make your code a [MRE], including a small DataFrame within the code itself so it can be run and tested. Screenshots of data can't be copied and pasted. — CrazyChucky, Jul 19 '22 at 15:30
You can check the conditions per column, see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html#pandas.Series.isin and then combine them with like `condition1 & condition2`. — Carlos Horn, Jul 19 '22 at 15:35
@PaulaCiesielska But you're indexing it with `iloc`. Are you using Pandas to import that Excel file, thus creating a DataFrame? — CrazyChucky, Jul 20 '22 at 07:27
(And if not, what sort of package *are* you using?) Please provide a complete [mre], including imports. — CrazyChucky, Jul 20 '22 at 07:41
I have included the part of the code used to import the excel file, not sure how I could provide the excel itself here — Paula Ciesielska, Jul 20 '22 at 09:05
You are indeed using Pandas. Rather than attach an Excel file, create a similar (small) DataFrame that has the columns you're talking about, and include that. You might want to check out [How to make good reproducible Pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — CrazyChucky, Jul 20 '22 at 12:21

score 0 · Answer 1 · answered Jul 20 '22 at 13:17

Check this out:

pd.read_csv('stackoverflow.csv')
   category book createdby
0  Multiple   NG     panda
1      Cert   DG     koala
2       Cap   PI    monkey
3  CertPlus   ZZ     panda
4       Cap   ll      joey
5      Cert   OT     koala
6       Cap   FG     koala
7      Cert   PI     koala
8     Block   SI     koala
9       Cap   II     koala
df.query("~(category in ['Cert', 'Cap'] and book in ['OT', 'FG', 'PI', 'II'] and createdby=='koala')")
   category book createdby
0  Multiple   NG     panda
1      Cert   DG     koala
2       Cap   PI    monkey
3  CertPlus   ZZ     panda
4       Cap   ll      joey
8     Block   SI     koala

pd.DataFrame.query can be used to filter data, the ~ at the beginning is a not operator.

BR E

Python - issue with removing fields from the output

1 Answers1