0

I am having some issues with Python. I am not new to programming, but I am with the python language in general. Currently I have a DF that looks like the below.

Current data frame

CONTENT_KEY Age_Bucket  Count
0        11005   F(25-34)      2
1        11005   F(45-49)      5
2        11005     M(65+)      8
3        11010   F(18-20)      5
4        11010   F(50-54)      4
5        11015   M(18-20)      3
6        11015     F(65+)      2

The target Data-frame is below

New Data-frame
   CONTENT_KEY  F(18-20)  M(18-20)  F(21-24)  M(21-24)  F(25-34) ....>
0        11005         0         0         0         0         2
3        11010         5         0         0         0         0 
5        11015         0         3         0         0         0

The goal is to change the Age_buckets to columns and transpose the data while collapsing the duplicate CONTENT_KEY's. I previously looked into the transpose utility but I could not make it work for this. I ended up doing nested for loops along with a mask utilizing the indexing of the data-frame. It works currently but if you have more than 15k+ rows it becomes inefficient to say the least.

Any help and tips would be appreciated. I have my solution below along with a sample df to illustrate my current workaround solution.

import pandas as pd
import numpy as np

data = {'CONTENT_KEY':[11005,11005,11005,11010,11010,11015,11015],
'Age_Bucket':['F(25-34)','F(45-49)','M(65+)','F(18-20)','F(50-54)','M(18-20)','F(65+)' ],
'Count':[2,5,8,5,4,3,2]}

df = pd.DataFrame(data, columns = ['CONTENT_KEY', 'Age_Bucket','Count'])

new_df = pd.DataFrame()
new_df = df.copy()
new_df.drop(['Age_Bucket', 'Count'], axis =1, inplace = True)
new_df.drop_duplicates(keep='first', inplace=True)
new_df['F(18-20)'] = 0
new_df['M(18-20)'] = 0
new_df['F(21-24)'] = 0
new_df['M(21-24)'] = 0
new_df['F(25-34)'] = 0
new_df['M(25-34)'] = 0
new_df['F(35-44)'] = 0
new_df['M(35-44)'] = 0
new_df['F(45-49)'] = 0
new_df['M(45-49)'] = 0
new_df['F(50-54)'] = 0
new_df['M(50-54)'] = 0
new_df['F(55-64)'] = 0
new_df['M(55-64)'] = 0
new_df['F(65+)'] = 0
new_df['M(65+)'] = 0
new_df['F(Unknown)'] = 0
new_df['M(Unknown)'] = 0

for itemA in new_df['CONTENT_KEY']:
    for index in range(len(df)):
        itemB = df.iloc[index,0]
        if(itemA == itemB):
           string_A = df.iloc[index,1]
           int_B = df.iloc[index,2]
           mask = (new_df['CONTENT_KEY'] == itemA)
           new_df[string_A][mask] = int_B

print(new_df)
Ryan Bowns
  • 47
  • 1
  • 2
  • and for your case it is `df.pivot(index='CONTENT_KEY', columns='Age_Bucket',values='Count').reset_index()` – Ben.T Apr 29 '20 at 22:52
  • 1
    @Ben.T Thank you for the post and link. I didn't think about using a pivot in this case and it is a lot more processing friendly. – Ryan Bowns Apr 30 '20 at 01:15

0 Answers0