-1
+-------+------------+---------------+-----------------+
| INDEX | SK_ID_CURR | CREDIT_ACTIVE | CREDIT_TYPE     |
+-------+------------+---------------+-----------------+
|     0 |     215354 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|     1 |     215354 | Active        | Credit card     |
+-------+------------+---------------+-----------------
|     2 |     215354 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|     3 |     215354 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|     4 |     215354 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|     5 |     215354 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|     6 |     215354 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|     7 |     162297 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|     8 |     162297 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|     9 |     162297 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|    10 |     162297 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|    11 |     162297 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|    12 |     162297 | Active        | Mortgage        |
+-------+------------+---------------+-----------------+
|    13 |     402440 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|    14 |     238881 | Closed        | Credit card     |
+-------+------------+---------------+-----------------+

I have the table above. I'd like to aggregate each column per id. For example I need to count the number of active and closed credits per SK_ID_CURR, and then make a column for active_credits and closed_credits, with the counted values. And the same for CREDIT_TYPE.

like:

SK_ID_CURR CREDIT_ACTIVE CREDIT_CLOSED CONSUMER_CREDIT CREDIT_CARD
215354       6                  1           4             3
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
hk_03
  • 192
  • 3
  • 12
  • 1
    the edit is not allowing the load of the dataframe from clipboard – Yuca Jul 31 '18 at 13:06
  • I didnt edit it. Try this please: SK_ID_CURR CREDIT_ACTIVE CREDIT_TYPE 0 215354 Closed Consumer credit 1 215354 Active Credit card 2 215354 Active Consumer credit 3 215354 Active Credit card 4 215354 Active Consumer credit 5 215354 Active Credit card 6 215354 Active Consumer credit 7 162297 Closed Consumer credit 8 162297 Closed Consumer credit 9 162297 Active Credit card 10 162297 Active Credit card 11 162297 Closed Consumer credit 12 162297 Active Mortgage 13 402440 Active Consumer credit 14 238881 Closed Credit card – hk_03 Jul 31 '18 at 13:07
  • 1
    I know you didn't :) – Yuca Jul 31 '18 at 13:08

4 Answers4

4

For this dataframe:

d={'SK_ID_CURR':[215354, 215354, 215354, 215354, 215354, 215354, 215354, 162297, 162297, 162297, 162297, 162297, 162297,402440 ,238881],
   'CREDIT_ACTIVE':['Closed', 'Active', 'Active', 'Active', 'Active', 'Active', 'Active', 'Closed', 'Closed', 'Active', 'Active', 'Closed', 'Active', 'Active', 'Closed' ],
   'CREDIT_TYPE':['Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Consumer credit', 'Consumer credit', 'Credit card', 'Credit card', 'Consumer credit',                      'Mortgage', 'Consumer credit', 'Credit card', ]}
df=pd.DataFrame(d)

print(df)

Output:

    SK_ID_CURR CREDIT_ACTIVE      CREDIT_TYPE
0       215354        Closed  Consumer credit
1       215354        Active      Credit card
2       215354        Active  Consumer credit
3       215354        Active      Credit card
4       215354        Active  Consumer credit
5       215354        Active      Credit card
6       215354        Active  Consumer credit
7       162297        Closed  Consumer credit
8       162297        Closed  Consumer credit
9       162297        Active      Credit card
10      162297        Active      Credit card
11      162297        Closed  Consumer credit
12      162297        Active         Mortgage
13      402440        Active  Consumer credit
14      238881        Closed      Credit card

You can try something like this:

aggregations = {
        'CREDIT_ACTIVE': { # work on this column, 
            'CREDIT_ACTIVE': lambda x: list(x).count('Active'),
            'CREDIT_CLOSED': lambda x: list(x).count('Closed') 
        },
        'CREDIT_TYPE': { # work on this column, 
            'CONSUMER_CREDIT ': lambda x: list(x).count('Consumer credit'),
            'CREDIT_CARD': lambda x: list(x).count('Credit card') 
        }}
temp=df.groupby('SK_ID_CURR').agg(aggregations).reset_index()
temp.columns = [e[1] for e in temp.columns.tolist()] 

print(temp)

Output:

           CREDIT_ACTIVE  CREDIT_CLOSED  CONSUMER_CREDIT   CREDIT_CARD
0  162297              3              3                 3            2
1  215354              6              1                 4            3
2  238881              0              1                 0            1
3  402440              1              0                 1            0
Ala Tarighati
  • 3,507
  • 5
  • 17
  • 34
1

Another way, slightly tedious perhaps but may be introduced to some different things.

creditClosed = df[df.CREDIT_ACTIVE == 'Closed']
creditOpened = df[df.CREDIT_ACTIVE == 'Active']
creditTypeCo = df[df.CREDIT_TYPE == 'Credit card']
creditTypeCr = df[df.CREDIT_TYPE == 'Consumer credit']

a = creditClosed.groupby(['SK_ID_CURR']).agg({'CREDIT_ACTIVE':'count'}).reset_index()
b = creditOpened.groupby(['SK_ID_CURR']).agg({'CREDIT_ACTIVE':'count'}).reset_index()
c = creditTypeCo.groupby(['SK_ID_CURR']).agg({'CREDIT_TYPE':'count'}).reset_index()
d = creditTypeCr.groupby(['SK_ID_CURR']).agg({'CREDIT_TYPE':'count'}).reset_index()

ab = pd.merge(a, b, how = 'outer', on = 'SK_ID_CURR')
abc = pd.merge(ab, c, how = 'outer', on = 'SK_ID_CURR')
final = pd.merge(abc, d, how = 'outer', on = 'SK_ID_CURR')

final.rename(columns = {'CREDIT_ACTIVE_x': 'CREDIT_CLOSED', 'CREDIT_ACTIVE_y': 'CREDIT_ACTIVE', 'CREDIT_TYPE_x': 'CREDIT_CARD', 'CREDIT_TYPE_y': 'CONSUMER_CREDIT'}, inplace = True)
final.fillna(0)

Output:

           CREDIT_ACTIVE  CREDIT_CLOSED  CONSUMER_CREDIT   CREDIT_CARD
0  162297              3              3                 3            2
1  215354              6              1                 4            3
2  238881              0              1                 0            1
3  402440              1              0                 1            0
Ankur Sinha
  • 6,473
  • 7
  • 42
  • 73
0

You can use pd.get_dummies(df.drop(columns=['SK_ID_CURR'])) to generate dummy columns like this: dummies of the dataframe

Concatenate it with "SK_ID_CURR" column and then you can group by "SK_ID_CURR". After that aggregate the data by sum using agg([sum]). Finally it's a matter of renaming columns meaningfully.

Sample code in python using pandas:

a = pd.get_dummies(df.drop(columns=['SK_ID_CURR']))
a = pd.concat([df.SK_ID_CURR, a], axis=1)
b = a.groupby(a.SK_ID_CURR).agg([sum])
b.columns = ['CREDIT_Active','CREDIT_Closed', 'Consumer_Credit', 'Credit_Card','Credit_Mortgage']
b.reset_index(inplace=True)
0

You can join a couple of pd.crosstab results after constructing a helper column.

Data from @AllaTarighati.

df['TYPE'] = np.where(df['CREDIT_TYPE'].str.contains('credit', case=False, na=False),
                      'Credit', 'Mortgage')

cross1 = pd.crosstab(df['SK_ID_CURR'], df['TYPE'] + '_' + df['CREDIT_ACTIVE'])
cross2 = pd.crosstab(df['SK_ID_CURR'], df['CREDIT_TYPE'])
res = cross1.join(cross2)

Result

print(res)

            Credit_Active  Credit_Closed  Mortgage_Active  Consumer credit  \
SK_ID_CURR                                                                   
162297                  2              3                1                3   
215354                  6              1                0                4   
238881                  0              1                0                0   
402440                  1              0                0                1   

            Credit card  Mortgage  
SK_ID_CURR                         
162297                2         1  
215354                3         0  
238881                1         0  
402440                0         0  
jpp
  • 159,742
  • 34
  • 281
  • 339