0

Scenario: I have a dataframe in which one of the rows is empty. Its value should be a sum of two of the other rows.

Data Input:

    ISIC2   2018    2019    2020    2021
0   A0      68      95      98      39
1   B0      95      19      5       98
2   B1                               
3   B2      58      86      10      90
9   C0      36      74      53      97

Expected output:

    ISIC2   2018    2019    2020    2021
0   A0      68      95      98      39
1   B0      95      19      5       98
2   B1      153     105     15      188
3   B2      58      86      10      90
9   C0      36      74      53      97

Here, where row ISIC2 == "B1", it should give the value of B0+B2 for each column.

What I tried: I was trying to do this in a loop with fixed references for the rows, but that does not seem to be a very effective way to do this:

year_list_2 = [2018,2019,2020,2021]
for year_var in year_list_2:
    for index1 in step4_1.iterrows():
        if step4_1.at[index1, "ISIC2"] == "B1":
            step4_1.at[index1, year_var] = step4_1.at[index1 - 1, year_var] + step4_1.at[index1 + 1, year_var]

Question: Is there a simpler way to do this?

DGMS89
  • 1,507
  • 6
  • 29
  • 60
  • Are the rows to add always the two above and below the blank row? – Barmar Aug 07 '23 at 20:02
  • You should be able to use the `.shift()` method to access the previous and next rows. – Barmar Aug 07 '23 at 20:04
  • @Barmar In this case yes, but it would be nice if there was some way to select the rows to be added based on a condition, e.g. the value of column ISIC2, where i always need to sum B0 and B2 independent of the order they appear. – DGMS89 Aug 07 '23 at 20:05
  • @RomanPerekhrest My bad, I will fix the typo – DGMS89 Aug 07 '23 at 20:06
  • 1
    @DGMS89 My comment was made before you fixed that typo, so I thought the criteria was that all the columns were blank and there was no other relationship. – Barmar Aug 07 '23 at 20:11
  • 1
    Why not to use something like `df.set_index('ISIC2'); df.loc['B1'] = df.loc['B0'] + df.loc['B2']` if you know the indexes? – Vitalizzare Aug 07 '23 at 20:22

2 Answers2

1

A possible solution:

idx = df.index[df.iloc[:,1:].isna().all(1)]
df.iloc[idx, 1:] = df.shift().iloc[idx, 1:] + df.shift(-1).iloc[idx, 1:]

Output:

  ISIC2   2018   2019  2020   2021
0    A0   68.0   95.0  98.0   39.0
1    B0   95.0   19.0   5.0   98.0
2    B1  153.0  105.0  15.0  188.0
3    B2   58.0   86.0  10.0   90.0
9    C0   36.0   74.0  53.0   97.0
PaulS
  • 21,159
  • 2
  • 9
  • 26
  • the output section is missing – Liam Aug 07 '23 at 20:09
  • 4
    Didn't downvote, but a good answer should not just be code, but an explanation of how it works. – Barmar Aug 07 '23 at 20:12
  • Thanks, @Barmar, for your comment, but, in this case, the logic involved is so easy and straightforward that dispenses with any explanation, I guess. – PaulS Aug 07 '23 at 20:15
0

You can leverage df.fillna() to fill in the missing row with the sum of the previous and subsequent row, performing a "centred" rolling sum.

# loading data using solution from https://stackoverflow.com/a/53692642/8718701
from io import StringIO
import pandas as pd


d = '''
    ISIC2   2018    2019    2020    2021
0   A0      68      95      98      39
1   B0      95      19      5       98
2   B1                               
3   B2      58      86      10      90
9   C0      36      74      53      97
'''

df = pd.read_csv(StringIO(d), sep='\s+')
df.set_index('ISIC2', inplace=True)

df.fillna(df.fillna(0).rolling(3, center=True).sum(), inplace=True)

print(df.to_markdown())

Returns:

| ISIC2   |   2018 |   2019 |   2020 |   2021 |
|:--------|-------:|-------:|-------:|-------:|
| A0      |     68 |     95 |     98 |     39 |
| B0      |     95 |     19 |      5 |     98 |
| B1      |    153 |    105 |     15 |    188 |
| B2      |     58 |     86 |     10 |     90 |
| C0      |     36 |     74 |     53 |     97 |

In case this doesn't fully work on your actual data, would you mind adapting the example to show where it fails?

Simon David
  • 663
  • 3
  • 13
  • 1
    It's not the preceding 2 rows. `B1 = B0 + B2`, which are the rows before and after. – Barmar Aug 07 '23 at 20:18
  • @Barmar adapted the code, now it works! – Simon David Aug 07 '23 at 20:25
  • IMO you perform too many operations to fill in some rows. Also, you risk to fill in some strayed empty cells as well, not only empty rows. – Vitalizzare Aug 07 '23 at 20:30
  • @Vitalizzare yes an implicit assumption is that there are no empty cells apart from the empty row, OP did not mention that empty cells apart from the empty rows are to be expected – Simon David Aug 07 '23 at 20:36