2

So I have a Data frame like this:

Date;   AK; AL ........

12/31/1976;  128,661;    954,940 

3/31/1977;   128,341;    963,555

.........

the Data Frame Shape is (156,56)

These are the rolling average, quarterly number for the 53 U.S territories, and I need to duplicate each row of the data frame (from quarterly into monthly).

So it should be like this:

12/31/1976  ; 128,661   ; 954,940 ......

1/31/1976    ;     128,661  ; 954,940 

2/31/1976   ; 128,661   ; 954,940  

3/31/1977   ; 128,341   ; 963,555 

4/31/1977   ; 128,341   ; 963,555 

5/31/1977   ; 128,341   ; 963,555

...............

So the ending Data Frame would be (156*3, 56) = (468,56).

Here is my shamefully amateurish way of solving the problem:

result=[]

for d in range(dfc.shape[0]):
    a=dfc.loc[[d]]
    result.append(a)
    for i in range(2):
        result.append(a)

result2 = pd.concat(result)

result2.to_csv(outputfile)

And now I have a list of 474 data frames in result and successfully join them into result2. But is there a more pythonic way of doing this?

Thank you very much for your time.

Sample Data from input csv

Date AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA PR RI SC SD TN TX US UT VA VI VT WA WI WV WY US

12/31/1976 128661 954940 553053 621466 7130131 808768 1194789 350566 213905 2615803 1462638 326404 848553 234033 3803577 1683495 651434 879378 1101983 1942755 1133973 299863 2999407 1425506 1472189 563727 219449 1736735 158068 454897 272603 2247374 284290 233236 5677756 3768974 757678 803867 3796384 456596 326356 836472 166527 1279266 3905285 68009341 362019 1449598 - 136259 1052788 1626165 481509 118196 136018680

3/31/1977 128341 963555 559382 632022 7210477 818252 1203495 349061 212093 2637798 1478518 329504 859381 237540 3829280 1700039 657837 886421 1110438 1950984 1140207 302194 3033862 1444873 1482550 569446 221903 1751718 159539 460068 276727 2254050 288767 239391 5685289 3785281 765835 816312 3807158 457408 329745 842357 168075 1289540 3953044 68563641 367915 1462887 - 137377 1069036 1640823 485301 120550 137127279

6/30/1977 126396 977083 567917 643876 7305609 829959 1215449 349629 212099 2672554 1495769 332130 869226 241135 3858154 1721593 665523 898318 1122502 1964295 1154737 304645 3069330 1463964 1497019 576081 223573 1772303 161208 464668 278415 2271529 293668 245175 5707264 3815464 774473 829472 3826951 455636 332956 850164 169482 1305168 4003226 69279773 373785 1479718 7696 138750 1087648 1660930 492362 123099 138559545

Omi Slash
  • 147
  • 1
  • 12

1 Answers1

2

I think you can use resample with Resampler.ffill. But there is problem with last values - need manualy add last row with datetime shifted to 2 months and with all same values as last row of original DataFrame

#convert column to datetime
df.Date = pd.to_datetime(df.Date)

#duplicated last row to another row with same values
df.loc[df.index[-1] + 1] = df.iloc[-1]
#shifted 2 months in column 'Date'
df.loc[df.index[-1], 'Date'] = df.loc[df.index[-1], 'Date'] + pd.offsets.DateOffset(months=2)
print (df)
        Date       AK       AL
0 1976-12-31  128,661  954,940
1 1977-03-31  128,341  963,555
2 1977-05-31  128,341  963,555

df = df.set_index('Date').resample('M').ffill()
print (df)
                 AK       AL
Date                        
1976-12-31  128,661  954,940
1977-01-31  128,661  954,940
1977-02-28  128,661  954,940
1977-03-31  128,341  963,555
1977-04-30  128,341  963,555
1977-05-31  128,341  963,555

If use old version of pandas need parameter fill_method='ffill' instaed .ffill() - see changed API in 0.18.0:

df = pd.read_csv('quarter to month.csv', thousands=',')
print (df) 
           Date      AK       AL       AR       AZ        CA       CO  \
0    12/31/1976  128661   954940   553053   621466   7130131   808768   
1     3/31/1977  128341   963555   559382   632022   7210477   818252   
2     6/30/1977  126396   977083   567917   643876   7305609   829959   
3     9/30/1977  121677   992007   576480   657475   7403502   844079   
...
...  

df.Date = pd.to_datetime(df.Date)

df.loc[df.index[-1] + 1] = df.iloc[-1]
#shifted 2 months in column 'Date'
df.loc[df.index[-1],'Date']=df.loc[df.index[-1],'Date'] + pd.offsets.DateOffset(months=2)

df = df.set_index('Date').resample('M', fill_method='ffill')
print (df)
               AK       AL       AR       AZ        CA       CO       CT  \
Date                                                                        
1976-12-31  128661   954940   553053   621466   7130131   808768  1194789   
1977-01-31  128661   954940   553053   621466   7130131   808768  1194789   
1977-02-28  128661   954940   553053   621466   7130131   808768  1194789   
1977-03-31  128341   963555   559382   632022   7210477   818252  1203495   
1977-04-30  128341   963555   559382   632022   7210477   818252  1203495   
1977-05-31  128341   963555   559382   632022   7210477   818252  1203495   
1977-06-30  126396   977083   567917   643876   7305609   829959  1215449   
1977-07-31  126396   977083   567917   643876   7305609   829959  1215449   
1977-08-31  126396   977083   567917   643876   7305609   829959  1215449   
1977-09-30  121677   992007   576480   657475   7403502   844079  1227102   
1977-10-31  121677   992007   576480   657475   7403502   844079  1227102   
1977-11-30  121677   992007   576480   657475   7403502   844079  1227102   
1977-12-31  120632  1005809   585722   672041   7543093   863180  1242052   
...
...

Explanation:

Resample omit last 2 rows, so you need manually add it to DataFrame for desired output. First find last index value by df.index[-1] (index is monotonic (0,1,2,3..) and there are only integers). Then add 1 and get index of another row - e.g. if last index is 50, another row index is 51. Then expand Dataframe by loc and add same values as last row - I use iloc for selecting last row. Then need change values of Date column in last row. So select it by df.loc[df.index[-1], 'Date'] and add two months by offset (IN [87]). Then you can use resample and get many new rows, in last rows get new row between old original last row and new original applying dateoffset.

Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Try before my code reset index by `df = df.reset_index()` – jezrael Nov 03 '16 at 11:17
  • When I run your code, I got "DataError: No numeric types to aggregate" – Omi Slash Nov 03 '16 at 12:07
  • what is your pandas version? `print (pd.show_versions())` . What is `df.info()` ? – jezrael Nov 03 '16 at 12:09
  • Can you send me your data by email (from my profile) ? – jezrael Nov 03 '16 at 12:15
  • Thank you very much for the solution. I have learned a few new trick for dataframe manipulations. @jezrael – Omi Slash Nov 03 '16 at 13:10
  • I looked at your solution and the bulk of your work is in this line: `df = df.set_index('Date').resample('M').ffill()` So if you could expand a little bit about the line of code above, and perhaps this line: `df.loc[df.index[-1],'Date']=df.loc[df.index[-1],'Date'] + pd.offsets.DateOffset(months=2)` for the benefits of those who will read this later, and myself right now as well (I am reading pandas documentation right now to understand your code, and..the migraine!). Thanks @jezrael – Omi Slash Nov 03 '16 at 13:51
  • 1
    sure, give me a sec. – jezrael Nov 03 '16 at 14:16