1

My data is broken up into 4 columns and looks like:

State       Year        Month        Value
AK          2010         1             10
AK          2010         3             20
AK          2011         1             28
AK          2011         5             29
AK          2011         12            31
.
.
TX          2010         2             10
TX          2010         3             11
TX          2010         4             20
TX          2010         12            22
TX          2011         4             30
TX          2011         7             33
.
.

I want to fill the missing Months with repetitions of the previous Values of the same Year because they are just cumulative sums that I've added together.

The months do not always begin back at Month 1 and sometimes can be missing full years so I need to address this.

Ie: TX can start at Month 4 in 2011 etc...

The desired output looks like:

State       Year        Month        Value
AK          2010         1             10
AK          2010         2             10
AK          2010         3             20
AK          2010         4             20
AK          2010         5             20
.
.
AK          2010         12            20
AK          2011         1             28
AK          2011         2             28
.
.
TX          2010         1             9
TX          2010         2             10
TX          2010         3             11
TX          2010         4             20
TX          2010         5             20
.
.
TX          2010         12            22
HelloToEarth
  • 2,027
  • 3
  • 22
  • 48
  • Do you need the same Year - Month span for each state? Or are they separate for each state? – ALollz Nov 11 '18 at 00:21
  • 1
    Will every sequence begin with 1 always? – cs95 Nov 11 '18 at 00:23
  • They differ from State to State (AK may start at 1980 but TX or LA can start in 1991) but I need each Year to span the full 12 months for every State. – HelloToEarth Nov 11 '18 at 00:23
  • Good question, @coldspeed. They do not begin at 1 always but I need them filled from the last month's value and automatically fill each from 1-12. I have changed my question to better address this. – HelloToEarth Nov 11 '18 at 00:25

1 Answers1

1

One solution is to use Categorical Data:

# convert Month to categorical with 1-12 range
df['Month'] = pd.Categorical(df['Month'], categories=range(1, 13))

# groupby to give Cartesian product for categorical columns
df = df.groupby(['State', 'Year', 'Month']).first().reset_index()

# forward fill by group
df['Value'] = df.groupby('State')['Value'].ffill()

This solution assumes Dec-2010 data can spill over to null data for Jan-2011 for a particular state.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • I actually have a strange nan issue. I've posted a screenshot above. Looks like it's not picking up previous years at times but it's because of what you had mentioned. Is there a way to spill over previous years? – HelloToEarth Nov 11 '18 at 01:16
  • @HelloToEarth, Unfortunately, I can't replicate without a [mcve]. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) if you need help with this (images / links don't help). – jpp Nov 11 '18 at 01:21