1

I am trying so summarise a dataframe like this one:

data = {
    'pt_ID': [0,1,2,3,4,5,6,7,8,9],
    'UNIQUE_ID': [1,1,1,1,2,2,2,2,2,3],
    'prediction': ['Scrub100','Scrub100','Rush100','Grass',
                   'Grass','Grass','Grass','Scrub100','Building',
                   'Building']
    }

inputdf = pd.DataFrame(data)

The desired output would be an aggregation of column 'UNIQUE_ID' and the count of column 'prediction'. Each category in column 'prediction' should be a column in the new dataframe, like the example below:

data = {
    'UNIQUE_ID': [1,2,3],
    'Scrub100': [2,1,0],
    'Rush100': [1,0,0],
    'Grass': [1,3,0],
    'Building':[0,1,1]
    }

desiredDF = pd.DataFrame(data)

I have tried to use pandas pivot and crosstab, but I get

summary2 = inputdf.pivot('UNIQUE_ID','prediction')
# ValueError: Index contains duplicate entries, cannot reshape

summary2 = pd.crosstab(index = "UNIQUE_ID", columns = 'prediction')
# ValueError: If using all scalar values, you must pass an index

The closest I could get was:

summary2 = inputdf.groupby(['UNIQUE_ID','prediction']).count().unstack(0).fillna(0).astype(int).T

Which does not seem to be the best method. Can anyone help, please?

Fjord
  • 144
  • 1
  • 9

0 Answers0