Counting Frequency of an Aggregate result using pandas

Question

Broadly I have the Smart Meters dataset from Kaggle and I'm trying to get a count of the first and last measure by house, then trying to aggregate that to see how many houses began (or ended) reporting on a given day. I'm open to methods totally different than the line I pursue below.

In SQL, when exploring data I often used something like following:

SELECT Max_DT, COUNT(House_ID) AS HouseCount 
FROM 
(
    SELECT House_ID, MAX(Date_Time) AS Max_DT 
    FROM ElectricGrid GROUP BY HouseID
) MeasureMax 
GROUP BY Max_DT

I'm trying to replicate this logic in Pandas and failing. I can get the initial aggregation like:

house_max = house_info.groupby('House_Id').agg({'Date_Time' :['max']})

However I'm failing to get the outer query. Specifically I don't know what the aggregated column is called. If I do a describe() it shows as Date_Time in the example above. I tried renaming the columns:

house_max.columns = ['House_Id','Max_Date_Time']

I found a StackOverflow discussion about renaming the results of aggregation and attempted to apply it:

house_max.columns = ["_".join(x) for x in house_max.columns.ravel()]

I still find that a describe() returns Date_Time as the column name.

start_end_collate = house_max.groupby('Date_Time_max')['House_Id'].size()

In the rename example my second query fails to find Date_Time or Max_Date_Time. In the later case, the Ravel code it appears to not find House_Id when I run it.

score 1 · Accepted Answer · answered Dec 18 '17 at 02:58

That's seems weird, I would think your code would not be able to find the House_Id field. After you perform your groupby on House_Id it becomes an index which you cannot reference as a column.

This should work:

house_max = house_info.groupby('House_Id').agg({'Date_Time' :['max']})
house_max.columns = ["_".join(x) for x in house_max.columns.ravel()]
start_end_collate = house_max.groupby('Date_Time_max').size()

Alternatively you can just drop the multilevel column:

house_max.columns = house_max.columns.droplevel(0)
start_end_collate = house_max.groupby('max').size()

you got it in one. Thanks. I've been staring at this for a long time so I haven't even tried to go back and comprehend my error but I'll look again tomorrow Thanks again. — Michael Amos, Dec 18 '17 at 03:07

Counting Frequency of an Aggregate result using pandas

1 Answers1