0

I would like to reproduce the "desired_outcome" column using Pandas. Essentially every time the "Acc Type" is equal to O, I would have to take the sum of the Balance and the Amount.

+--------+----------+-------+---------+--------+----------+-----------------+
| MainID |   Date   | SubID | Balance | Amount | Acc Type | desired_outcome |
+--------+----------+-------+---------+--------+----------+-----------------+
|      1 | 1/1/2020 |     1 |      10 | 5      | O        |              15 |
|      1 | 1/1/2020 |     1 |      10 | 4      | R        |              10 |
|      1 | 1/1/2020 |     2 |      20 | 5      | O        |              25 |
|      1 | 1/1/2020 |     2 |      20 | 4      | R        |              20 |
|      1 | 1/1/2020 |     3 |      30 | 5      | O        |              35 |
|      1 | 1/1/2020 |     3 |      30 | 4      | R        |              30 |
|      1 | 2/1/2020 |     1 |      40 | 5      | O        |              45 |
|      1 | 2/1/2020 |     1 |      40 | 4      | R        |              40 |
|      1 | 2/1/2020 |     2 |      50 | 5      | O        |              55 |
|      1 | 2/1/2020 |     2 |      50 | 4      | R        |              50 |
|      1 | 2/1/2020 |     3 |      60 | 5      | O        |              65 |
|      1 | 2/1/2020 |     3 |      60 | 4      | R        |              60 |
|      2 | 1/1/2020 |     7 |     100 | NaN    | O        |             100 |
|      2 | 1/1/2020 |     7 |     100 | NaN    | R        |             100 |
+--------+----------+-------+---------+--------+----------+-----------------+

Also, I know that this isn't an ideal dataframe and the ideal approach probably would be to have two dataframes. How can I set it up where I would have a second dataframe that looks like the below: and still be able to have the desired_output column as above (without the extra row since acc type will no longer be there)

+--------+----------+------------+----------+
| MainID |   Date   | Acc Amount | Acc Type |
+--------+----------+------------+----------+
|      1 | 1/1/2020 | 5          | O        |
|      1 | 1/1/2020 | 4          | R        |
|      1 | 2/1/2020 | 5          | O        |
|      1 | 2/1/2020 | 4          | R        |
|      2 | 1/1/2020 | NaN        | O        |
|      2 | 1/1/2020 | NaN        | R        |
+--------+----------+------------+----------+

Thanks!

MartinW
  • 25
  • 3

1 Answers1

2

You dataframe is just fine. Here's what I would do:

df['desired_outcome'] = np.where(df['Acc Type']=='O', 
                                 df['Balance'] + df['Amount'].fillna(0),
                                 df['Balance'])

Output:

    MainID      Date  SubID  Balance  Amount Acc Type  desired_outcome
0        1  1/1/2020      1       10     5.0        O             15.0
1        1  1/1/2020      1       10     4.0        R             10.0
2        1  1/1/2020      2       20     5.0        O             25.0
3        1  1/1/2020      2       20     4.0        R             20.0
4        1  1/1/2020      3       30     5.0        O             35.0
5        1  1/1/2020      3       30     4.0        R             30.0
6        1  2/1/2020      1       40     5.0        O             45.0
7        1  2/1/2020      1       40     4.0        R             40.0
8        1  2/1/2020      2       50     5.0        O             55.0
9        1  2/1/2020      2       50     4.0        R             50.0
10       1  2/1/2020      3       60     5.0        O             65.0
11       1  2/1/2020      3       60     4.0        R             60.0
12       2  1/1/2020      7      100     NaN        O            100.0
13       2  1/1/2020      7      100     NaN        R            100.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74