How to Merge Columns in Rows in a Dataframe that fulfill a Condition, while deleting the Rows

Question

I dont think I can solve it with groupby() or agg() like in these (Question1, Question2)'s.

I have a pandas.DataFrame that has one identifier column (ID_Code) and some information columns(information 1 and information 2). I need to aggregate some of the identifiers. Meaning some have to be deleted and their information has to be added into specific other rows.

To illustrate my problem here is something I made up:

import pandas as pd

inp = [{'ID_Code':1,'information 1':list(x * 3 for x in range(2, 5)),'information 2':list(x / 3 for x in range(2, 5))},
       {'ID_Code':2,'information 1':list(x * 0.5 for x in range(2, 5)),'information 2':list(x / 2 for x in range(2, 5))},
       {'ID_Code':3,'information 1':list(x * 0.2 for x in range(25, 29)),'information 2':list(x / 1 for x in range(2, 5))},
       {'ID_Code':4,'information 1':list(x * 0.001 for x in range(102, 105)),'information 2':list(x / 12 for x in range(2, 5))},
       {'ID_Code':5,'information 1':list(x * 12 for x in range(15, 17)),'information 2':list(x / 24 for x in range(2, 5))},
       {'ID_Code':6,'information 1':list(x * 42 for x in range(2, 9)),'information 2':list(x / 48 for x in range(2, 5))},
       {'ID_Code':7,'information 1':list(x * 23 for x in range(1, 2)),'information 2':list(x / 96 for x in range(2, 5))},
       {'ID_Code':8,'information 1':list(x * 7.8 for x in range(8, 11)),'information 2':list(x / 124 for x in range(2, 5))}]

df = pd.DataFrame(inp)

print(df)
Out:
       ID_Code                                                    information 1   information 2
    0        1                                                       [6, 9, 12]   [0.6666666666666666, 1.0, 1.3333333333333333]
    1        2                                                  [1.0, 1.5, 2.0]   [1.0, 1.5, 2.0]
    2        3                              [5.0, 5.2, 5.4, 5.6000000000000005]   [2.0, 3.0, 4.0]
    3        4  [0.10200000000000001, 0.10300000000000001, 0.10400000000000001]   [0.16666666666666666, 0.25, 0.3333333333333333]
    4        5                                                       [180, 192]   [0.08333333333333333, 0.125, 0.16666666666666666]
    5        6                               [84, 126, 168, 210, 252, 294, 336]   [0.041666666666666664, 0.0625, 0.08333333333333333]
    6        7                                                             [23]   [0.041666666666666664, 0.0625, 0.08333333333333333]
    7        8                                               [62.4, 70.2, 78.0]   [0.016129032258064516, 0.024193548387096774, 0.03225806451612903]

What do I need to do, if I want to get rid of ID_Code = 1 and store it's information in ID_Code = 3, and get rid of ID_Code = 5 and ID_Code = 7 and store that information in ID_Code = 2, so that the DataFrame looks like this:

   ID_Code                                                    information 1   information 2
0        2                                    [180, 192, 23, 1.0, 1.5, 2.0]   [0.08333333333333333, 0.125, 0.16666666666666666, 0.041666666666666664, 0.0625, 0.08333333333333333, 1.0, 1.5, 2.0]
1        3                    [6, 9, 12, 5.0, 5.2, 5.4, 5.6000000000000005]   [2.0, 3.0, 4.0]
2        4  [0.10200000000000001, 0.10300000000000001, 0.10400000000000001]   [0.6666666666666666, 1.0, 1.3333333333333333, 0.16666666666666666, 0.25, 0.3333333333333333]
3        6                               [84, 126, 168, 210, 252, 294, 336]   [0.041666666666666664, 0.0625, 0.08333333333333333]
4        8                                               [62.4, 70.2, 78.0]   [0.016129032258064516, 0.024193548387096774, 0.03225806451612903]

score 1 · Answer 1 · answered Jun 17 '20 at 14:59

1

You can set ID_Code as index, and update with list comprehension:

df=df.set_index('ID_Code')
df.loc[3] = [x+y for x,y in zip(df.loc[1], df.loc[3])]
df = df.drop(1)

answered Jun 17 '20 at 14:59

Quang Hoang

146,074
10
56
74

does this need updated python ? doesn't update for me when i run it – Umar.H Jun 17 '20 at 15:06
@Datanovice should work with most version of Python/Pandas. – Quang Hoang Jun 17 '20 at 15:07

score 1 · Accepted Answer · answered Jun 17 '20 at 15:01

you could conditionally change your df['ID_Code'] then sum the columns.

col = 'ID_Code'
cond = [df[col].eq(1),
       df[col].isin([5,7])]

outputs = [3,2]

df[col] = np.select(cond,outputs,default=df[col])

df1 = df.groupby(col).sum()

print(df1)


                                             information 1  \
ID_Code                                                      
2                            [1.0, 1.5, 2.0, 180, 192, 23]   
3            [6, 9, 12, 5.0, 5.2, 5.4, 5.6000000000000005]   
4        [0.10200000000000001, 0.10300000000000001, 0.1...   
6                       [84, 126, 168, 210, 252, 294, 336]   
8                                       [62.4, 70.2, 78.0]   

                                             information 2  
ID_Code                                                     
2        [1.0, 1.5, 2.0, 0.08333333333333333, 0.125, 0....  
3        [0.6666666666666666, 1.0, 1.3333333333333333, ...  
4          [0.16666666666666666, 0.25, 0.3333333333333333]  
6        [0.041666666666666664, 0.0625, 0.0833333333333...  
8        [0.016129032258064516, 0.024193548387096774, 0...

How to Merge Columns in Rows in a Dataframe that fulfill a Condition, while deleting the Rows

2 Answers2

Linked