I am having some issues with Python. I am not new to programming, but I am with the python language in general. Currently I have a DF that looks like the below.
Current data frame
CONTENT_KEY Age_Bucket Count
0 11005 F(25-34) 2
1 11005 F(45-49) 5
2 11005 M(65+) 8
3 11010 F(18-20) 5
4 11010 F(50-54) 4
5 11015 M(18-20) 3
6 11015 F(65+) 2
The target Data-frame is below
New Data-frame
CONTENT_KEY F(18-20) M(18-20) F(21-24) M(21-24) F(25-34) ....>
0 11005 0 0 0 0 2
3 11010 5 0 0 0 0
5 11015 0 3 0 0 0
The goal is to change the Age_buckets to columns and transpose the data while collapsing the duplicate CONTENT_KEY's. I previously looked into the transpose utility but I could not make it work for this. I ended up doing nested for loops along with a mask utilizing the indexing of the data-frame. It works currently but if you have more than 15k+ rows it becomes inefficient to say the least.
Any help and tips would be appreciated. I have my solution below along with a sample df to illustrate my current workaround solution.
import pandas as pd
import numpy as np
data = {'CONTENT_KEY':[11005,11005,11005,11010,11010,11015,11015],
'Age_Bucket':['F(25-34)','F(45-49)','M(65+)','F(18-20)','F(50-54)','M(18-20)','F(65+)' ],
'Count':[2,5,8,5,4,3,2]}
df = pd.DataFrame(data, columns = ['CONTENT_KEY', 'Age_Bucket','Count'])
new_df = pd.DataFrame()
new_df = df.copy()
new_df.drop(['Age_Bucket', 'Count'], axis =1, inplace = True)
new_df.drop_duplicates(keep='first', inplace=True)
new_df['F(18-20)'] = 0
new_df['M(18-20)'] = 0
new_df['F(21-24)'] = 0
new_df['M(21-24)'] = 0
new_df['F(25-34)'] = 0
new_df['M(25-34)'] = 0
new_df['F(35-44)'] = 0
new_df['M(35-44)'] = 0
new_df['F(45-49)'] = 0
new_df['M(45-49)'] = 0
new_df['F(50-54)'] = 0
new_df['M(50-54)'] = 0
new_df['F(55-64)'] = 0
new_df['M(55-64)'] = 0
new_df['F(65+)'] = 0
new_df['M(65+)'] = 0
new_df['F(Unknown)'] = 0
new_df['M(Unknown)'] = 0
for itemA in new_df['CONTENT_KEY']:
for index in range(len(df)):
itemB = df.iloc[index,0]
if(itemA == itemB):
string_A = df.iloc[index,1]
int_B = df.iloc[index,2]
mask = (new_df['CONTENT_KEY'] == itemA)
new_df[string_A][mask] = int_B
print(new_df)