0

I am working on a geospatial project where I need to do some calculations between groups of data within a data frame. The data I am using spans over several different years and specific to the Local Authority District code, each year has a numerical ID.

I need to be able to calculate the mean average of a group of years within that data set relative to the LAD code.

LAC          LAN                JAN     FEB     MAR     APR     MAY     JUN     ID
K04000001    ENGLAND AND WALES  56597   43555   49641   88049   52315   42577   5
E92000001    ENGLAND            53045   40806   46508   83504   49413   39885   5

I can use groupby to calculate the mean based on a LAC, but what I can't do is calculate the mean grouped by LAC for ID 1:3 for example.

What is more efficient, seperate in to seperate dataframes stored in an dict for example, or keep in one dataframe and use an ID?

df.groupby('LAC').mean()

I come frome a matlab background so just getting the hang of the best way to do things.

Secondly, once these operatons are complete, I would like to do the following:

(mean of id - 1:5 - mean id:6) using LAC as the key.

Sorry if I haven't explained this very well!

Edit: Expected output.

To be able to average a group of rows by specific ID for a given value of LAC.

For example: Average monthly values for E92000001 rows with ID 3

LAC         JAN     FEB     MAR     APR     MAY     JUN    ID
K04000001,  56706   43653   49723   88153   52374   42624   5
K04000001   56597   43555   49641   88049   52315   42577   5
E92000001   49186   36947   42649   79645   45554   36026   5
E92000001   53045   40806   46508   83504   49413   39885   3
E92000001   68715   56476   62178   99174   65083   55555   4
E92000001   41075   28836   34538   71534   37443   27915   3
E92000001   54595   42356   48058   85054   50963   41435   1

Rows to be averaged:

E92000001   53045   40806   46508   83504   49413   39885   3
E92000001   41075   28836   34538   71534   37443   27915   3

Result

E92000001   47060   34821   40523   77519   43428   33900   3

edit: corrected error.

Owain
  • 1
  • 1

1 Answers1

0

To match the update in your question. This will give you a dataframe with only one row for each ID-LAC combination, with the average of all the rows that had that index.

df.groupby(['ID', 'LAC']).mean()

I would start by setting the year and LAC as the index

df.set_index(['ID', 'LAC'], inplace=True).sort_index(inplace=True)

Now you can groupby Index and get the mean for every month, or even each row's average since the first year.

expanding_mean = df.groupby('index').cumsum() / (df.groupby('index').cumcount() + 1)
RichieV
  • 5,103
  • 2
  • 11
  • 24