I am trying so summarise a dataframe like this one:
data = {
'pt_ID': [0,1,2,3,4,5,6,7,8,9],
'UNIQUE_ID': [1,1,1,1,2,2,2,2,2,3],
'prediction': ['Scrub100','Scrub100','Rush100','Grass',
'Grass','Grass','Grass','Scrub100','Building',
'Building']
}
inputdf = pd.DataFrame(data)
The desired output would be an aggregation of column 'UNIQUE_ID' and the count of column 'prediction'. Each category in column 'prediction' should be a column in the new dataframe, like the example below:
data = {
'UNIQUE_ID': [1,2,3],
'Scrub100': [2,1,0],
'Rush100': [1,0,0],
'Grass': [1,3,0],
'Building':[0,1,1]
}
desiredDF = pd.DataFrame(data)
I have tried to use pandas pivot and crosstab, but I get
summary2 = inputdf.pivot('UNIQUE_ID','prediction')
# ValueError: Index contains duplicate entries, cannot reshape
summary2 = pd.crosstab(index = "UNIQUE_ID", columns = 'prediction')
# ValueError: If using all scalar values, you must pass an index
The closest I could get was:
summary2 = inputdf.groupby(['UNIQUE_ID','prediction']).count().unstack(0).fillna(0).astype(int).T
Which does not seem to be the best method. Can anyone help, please?