pandas repeat rows gaps in dates

Question

I have a pandas dataframe that looks similar to below (I have it in CSV now, since it's on my computer at work):

UNITID,COMPETITORID,COMPETORNAME,PRICE,DATECHANGE
123,555,xyz 1, 2.33,3/3/2013
123,555,xyz 1, 2.34,3/5/2013
123,555,xyz 1, 2.24,3/15/2013
123,666,xyz 2, 4.24,2/15/2013
123,666,xyz 2, 4.44,3/15/2013
123,666,xyz 2, 1.44,3/25/2013
223,777,xyz 3, 2.44,3/25/2013
223,777,xyz 3, 2.54,3/28/2013
223,777,xyz 3, 1.54,3/29/2013

I am trying to fill in the gaps in dates, ex:

123,555,xyz 1, 2.33,3/3/2013
123,555,xyz 1, 2.33,3/4/2013
123,555,xyz 1, 2.34,3/5/2013
123,555,xyz 1, 2.34,3/6/2013
123,555,xyz 1, 2.34,3/7/2013
123,555,xyz 1, 2.34,3/8/2013
.
.

I'm relatively new to Pandas and I've seen some somewhat similar examples, but can't seem to get them to work. I had come up with one solution that was probably inefficient, where I copied the date field, then shifted it up, and subtracted the dates, then iterated through the rows per number of days difference - probably not the best.

Any ideas/advice?

Thanks.

possible duplicate of [Add missing dates to pandas dataframe](http://stackoverflow.com/questions/19324453/add-missing-dates-to-pandas-dataframe) — Ami Tavory, Jun 06 '15 at 07:59

EdChum · Answer 1 · 2015-06-06T08:04:57.040

I'd load the csv and parse the 'DATECHANGE' column to a datetime, then set_index to this column, call resample and pass param 'fill_method=ffill'to perform a daily resample and then reset_index.:

In [2]:
# load the data
t="""UNITID,COMPETITORID,COMPETORNAME,PRICE,DATECHANGE
123,555,xyz 1, 2.33,3/3/2013
123,555,xyz 1, 2.34,3/5/2013
123,555,xyz 1, 2.24,3/15/2013
123,666,xyz 2, 4.24,2/15/2013
123,666,xyz 2, 4.44,3/15/2013
123,666,xyz 2, 1.44,3/25/2013
223,777,xyz 3, 2.44,3/25/2013
223,777,xyz 3, 2.54,3/28/2013
223,777,xyz 3, 1.54,3/29/2013"""

df=pd.read_csv(io.StringIO(t), parse_dates=['DATECHANGE'])


In [7]:

df.set_index('DATECHANGE').resample('D', fill_method='ffill').reset_index()

Out[7]:

   DATECHANGE  UNITID  COMPETITORID  PRICE
0  2013-02-15     123         666.0   4.24
1  2013-02-16     123         666.0   4.24
2  2013-02-17     123         666.0   4.24
3  2013-02-18     123         666.0   4.24
4  2013-02-19     123         666.0   4.24
5  2013-02-20     123         666.0   4.24
6  2013-02-21     123         666.0   4.24
7  2013-02-22     123         666.0   4.24
8  2013-02-23     123         666.0   4.24
9  2013-02-24     123         666.0   4.24
10 2013-02-25     123         666.0   4.24
11 2013-02-26     123         666.0   4.24
12 2013-02-27     123         666.0   4.24
13 2013-02-28     123         666.0   4.24
14 2013-03-01     123         666.0   4.24
15 2013-03-02     123         666.0   4.24
16 2013-03-03     123         555.0   2.33
17 2013-03-04     123         555.0   2.33
18 2013-03-05     123         555.0   2.34
19 2013-03-06     123         555.0   2.34
20 2013-03-07     123         555.0   2.34
21 2013-03-08     123         555.0   2.34
22 2013-03-09     123         555.0   2.34
23 2013-03-10     123         555.0   2.34
24 2013-03-11     123         555.0   2.34
25 2013-03-12     123         555.0   2.34
26 2013-03-13     123         555.0   2.34
27 2013-03-14     123         555.0   2.34
28 2013-03-15     123         610.5   3.34
29 2013-03-16     123         610.5   3.34
30 2013-03-17     123         610.5   3.34
31 2013-03-18     123         610.5   3.34
32 2013-03-19     123         610.5   3.34
33 2013-03-20     123         610.5   3.34
34 2013-03-21     123         610.5   3.34
35 2013-03-22     123         610.5   3.34
36 2013-03-23     123         610.5   3.34
37 2013-03-24     123         610.5   3.34
38 2013-03-25     173         721.5   1.94
39 2013-03-26     173         721.5   1.94
40 2013-03-27     173         721.5   1.94
41 2013-03-28     223         777.0   2.54
42 2013-03-29     223         777.0   1.54

You have to temporarily set the index to the 'DATECHANGE' column as resample only works with datetime like indices.

Thanks for your help. This is almost what I need, except there are repeating dates for the various UNITID/COMPETITORID combos. Regardless, this is still helpful. — user1624577, Jun 06 '15 at 08:40

pandas repeat rows gaps in dates

1 Answers1