1

In the below table I am trying to change the figures in the column "Value" if it meets the condition.

Condition: if the "Date" for Identifier A_xcxcxcis < '05/05/2021' then "Value" for both A_xcxcxc and DAB_bcbcbc (the row below it) is changed to zero, else both values remain the same (the two rows are linked to each other).

Then it will perform the same check for BAB_bnbnb, CCC_eetr, D_tyty, and so on. I have shown the expected answer to the table below.

*** Had to make some edits to the original table

New table:

import pandas as pd
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'DAB_bcbcbc', 'BAB_bnbnb', 'XYZ_ererte', 'CCC_eetr', 'CZc_rgrg', 'D_tyty', 'sdD_wewerw', 'sdE_tyty', 'Esd_fhg'], 
                   'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2021', '22/01/2060', '04/03/2021', '22/07/2068', '08/03/2021', '22/11/2065', '05/04/2021', '22/03/2052', '15/10/2025', '22/07/2065']
})

enter image description here

Expected answer:

enter image description here

Noob Geek
  • 409
  • 6
  • 20
user13412850
  • 509
  • 1
  • 5
  • 16
  • Do not provide your data as a picture, we cannot copy and use it. – Erfan Oct 08 '20 at 10:45
  • yeh, have been trying to fix that past 10 mins. everytime I paste from my excel it goes in as picture – user13412850 Oct 08 '20 at 10:47
  • Here's a good guide [link](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), basically just use `pd.DataFrame..` so we can copy and paste the code. – Erfan Oct 08 '20 at 10:48

1 Answers1

2

I think you want:

df.loc[pd.to_datetime(df['Date'], format='%d/%m/%Y')
         .groupby(df['IDENTIFIER'].str.split('_').str[0])
         .transform('first')
         .lt(pd.to_datetime('05/05/2021', format='%d/%m/%Y')), 'income'] = 0
print(df)

  IDENTIFIER      income        Date
0   A_xcxcxc         0.0  03/03/2021
1   A_bcbcbc         0.0  22/01/2060
2    B_bnbnb         0.0  04/03/2021
3   B_ererte         0.0  22/07/2068
4     C_eetr         0.0  08/03/2021
5     C_rgrg         0.0  22/11/2065
6     D_tyty         0.0  05/04/2021
7   D_wewerw         0.0  22/03/2052
8     E_tyty -22111384.6  15/10/2025
9      E_fhg    200000.0  22/07/2065

If you want groupby each two rows:

df.loc[pd.to_datetime(df['Date'], format='%d/%m/%Y')
         .groupby(np.arange(df.shape[0])//2)
         .transform('first')
         .lt(pd.to_datetime('05/05/2021', format='%d/%m/%Y')), 'income'] = 0
print(df)
ansev
  • 30,322
  • 5
  • 17
  • 31
  • thanks @ansev, that worked well just going through my file and there are over 1000 rows. some of the "row - pairs" dont have the same first alphabet. so if you replace your row 1 Identifier with something else say DFT_xxxtd your solution would work. am amending the original question as well. – user13412850 Oct 08 '20 at 11:18
  • ok, so now the problem comes down to finding an object to use with GroupBy :) – ansev Oct 08 '20 at 11:19
  • i have changed the orignal table now in the question. – user13412850 Oct 08 '20 at 11:27
  • I think @ansev already assume this fact and had written the code accordingly. – Noob Geek Oct 08 '20 at 11:32
  • no, it doesnt give me the answer when I change the names under Identifier – user13412850 Oct 08 '20 at 11:35