1

I'm using Pandas in Jupyter notebook to compute the following from dataset structure below which contains duplicates by the column 'product'.

| name  | val_id |val_amt | our_val_amt | val_against | product | **NEW FIELD**
| compx |  xx1   | 530    |   424       |    418      |   XL    |   290
| compx |  xx1   | 530    |   424       |    134      |   CL    |   134
| compx |  xx2   | 590    |   472       |    600      |   XL    |   369
| compx |  xx2   | 590    |   472       |    103      |   CL    |   103

I am trying to loop through the 'name' column (only 1 in my example but have more) and perform the following if/then conditions. Before finally structuring the data as per the output.

Conditions

1.if 'product' = 'CL' and is less than 'val_against' insert the 'saved_against' value in 'NEW FIELD'. e.g val_against (134) < our_val_amt (424) therefore 'NEW FIELD' = 134 (row2).

2.if product = 'XL' and is in the same 'val_id' (xx1). The remainder to be subtracted from to be inserted in 'NEW FIELD'. 'our_val_amt' = 424 - 134 (from step 1) = 290. This inserted above NEW FIELD.

3.Repeat steps for val_id xx2. NEW FIELD CL = 103 and XL = 472 - 103 = 369.

4.Last Step restructure the data to desired output below.

I've tried grouping the variables together and having the 'product' as columns but unfortunately i receive 6000+ columns and have no idea on how to compute the calculation for this scenario. i've also attempted to loop through without much success.

Desired output is:

                           val id
                        ---------------
                        | xx1   | xx2 |
                        | our_val_amt |
---------------------------------------
| product  |val_against |  424   | 472 |
----------------------------------------
| XL       |   418      |  290   |     |
| CL       |   134      |  134   |     |
| XL       |   600      |        | 369 |
| CL       |   103      |        | 103 |
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
svenvuko
  • 19
  • 6
  • In fact i believe i need to do a double loop on the column 'name' and then 'val_id' due to the structure of my dataset. I've in addition also tried grouping to make it unique but believe this impacts the calculations i'd like to perform. Thank you for your time. – svenvuko Jan 11 '23 at 12:47
  • seems you can do this with the `.unstack` method. do check and let me know if my answer works for you – Akshay Sehgal Jan 11 '23 at 13:02
  • Could i please ask you to review this question Akshay https://stackoverflow.com/questions/75142082/for-loop-with-conditional-statements-not-working-as-expected-pandas – svenvuko Jan 17 '23 at 04:48
  • Hi Akshay, wondering if you may know how to help with this question? https://stackoverflow.com/q/75216508/20923737 – svenvuko Jan 24 '23 at 11:49

1 Answers1

0

Is this what you are looking for?

col = '**NEW FIELD**'
idx = ['product','val_against','val_id','our_val_amt']

r = df.set_index(idx).unstack([-2,-1])[col].reset_index().fillna('')
r
product val_against    xx1    xx2
                       424    472
     CL         103         103.0
     CL         134  134.0       
     XL         418  290.0       
     XL         600         369.0

NOTE: Remove the last .fillna('') to keep the values as numeric and leave missing values as nan as that would be much better to work with in later steps.

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • Unfortunately i was not able replicate the same outcome as you did. the error i'm seeing is in the idx = [ ]. I will try again tomorrow and see if it works. – svenvuko Jan 12 '23 at 12:45
  • oh, what do you mean by `idx = [ ]`? its an input field. you have to define idx as `idx = ['product','val_against','val_id','our_val_amt']` and col as `col = 'NEW FIELD'` – Akshay Sehgal Jan 12 '23 at 12:48
  • can you post the error, and maybe a screenshot of the run you are doing in your question? – Akshay Sehgal Jan 12 '23 at 12:49
  • col = 'NEW FIELD' idx = ['product', 'val_id', 'our_val_amt'] r = df6v2_adel.set_index(idx).unstack([-2,-1])[col].reset_index() r KeyError: 'NEW FIELD' – svenvuko Jan 13 '23 at 07:51
  • as you can see in the error, the `keyerror` is 'NEW FIELD' which means that python is not able to find the column named 'NEW FIELD', whats the actual name of the column that you have in your question `**NEW FIELD**`??, just replace 'NEW FIELD' with that. – Akshay Sehgal Jan 13 '23 at 08:39
  • updated the code, do check and let me know if that works. – Akshay Sehgal Jan 13 '23 at 12:27