I have a flat table of records about objects. Object have a type (ObjType) and are hosted in containers (ContainerId). The records also have some other attributes about the objects. However, they are not of interest at present. So, basically, the data looks like this:
Id ObjName XT ObjType ContainerId
2 name1 x1 A 2
3 name2 x5 B 2
22 name5 x3 D 7
25 name6 x2 E 7
35 name7 x3 G 7
..
..
92 name23 x2 A 17
95 name24 x8 B 17
99 name25 x5 A 21
What I am trying to do is 're-pivot' this data to further analyze which containers are 'similar' by looking at the types of objects they host in aggregate.
So, I am looking to convert the above data to the form below:
ObjType A B C D E F G
ContainerId
2 2.0 1.0 1.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 1.0 2.0 1.0 1.0
9 1.0 1.0 0.0 1.0 0.0 0.0 0.0
11 0.0 0.0 0.0 2.0 3.0 1.0 1.0
14 1.0 1.0 0.0 1.0 0.0 0.0 0.0
17 1.0 1.0 0.0 0.0 0.0 0.0 0.0
21 1.0 0.0 0.0 0.0 0.0 0.0 0.0
This is how I have managed to do it currently (after a lot of stumbling and using various tips from questions such as this one). I am getting the right results but, being new to Pandas and Python, I feel that I must be taking a long route. (I have added a few comments to explain the pain points.)
import pandas as pd
rdf = pd.read_csv('.\\testdata.csv')
#The data in the below group-by is all that is needed but in a re-pivoted format...
rdfg = rdf.groupby('ContainerId').ObjType.value_counts()
#Remove 'ContainerId' and 'ObjType' from the index
#Had to do reset_index in two steps because otherwise there's a conflict with 'ObjType'.
#That is, just rdfg.reset_index() does not work!
rdx = rdfg.reset_index(level=['ContainerId'])
#Renaming the 'ObjType' column helps get around the conflict so the 2nd reset_index works.
rdx.rename(columns={'ObjType':'Count'}, inplace=True)
cdx = rdx.reset_index()
#After this a simple pivot seems to do it
cdf = cdx.pivot(index='ContainerId', columns='ObjType',values='Count')
#Replacing the NaNs because not all containers have all object types
cdf.fillna(0, inplace=True)
Ask: Can someone please share other possible approaches that could perform this transformation?