0

I have a dataframe like this:

zip      season   season_start_date   season_end_date
zip1     winter   2015-11-25          2016-03-09

I need kind to flatten the date between the start and end dates. I expect the output like this:

zip       season   date   
zip1      winter   2015-11-25
zip1      winter   2015-11-26
.
.
zip1      winter   2016-03-09

How could I realize in a more elegant way?

data = {"zip":["zip1","zip1"],
    "season":["s6","s6"],
    "season_start_date": ["2011-01-01","2011-01-01"], 
    "season_end_date" : ["2012-01-05","2012-01-05"]
   }
df = pd.DataFrame(data=data)

Thanks.

newleaf
  • 2,257
  • 8
  • 32
  • 52
  • Since I didn't have time to finish the solution i was writting, [here is an useful concept](https://stackoverflow.com/questions/7274267/print-all-day-dates-between-two-dates/7274316) I was using as reference :) – Pedro Martins de Souza Dec 11 '18 at 21:01
  • 2
    Possible duplicate of [Performance issue turning rows with start - end into a dataframe with TimeIndex](https://stackoverflow.com/questions/50747359/performance-issue-turning-rows-with-start-end-into-a-dataframe-with-timeindex) – Matthias Fripp Dec 11 '18 at 21:21
  • That's what I need. Thanks for the elegant solution. – newleaf Dec 11 '18 at 21:33

2 Answers2

0
from datetime import datetime, timedelta

Row_to_split = df.loc[1]
Season = Row_to_split['season']
Start_Date = datetime.strptime(Row_to_split['season_start_date']', '%Y-%m-%d')
End_Date = datetime.strptime(Row_to_split['season_end_date']', '%Y-%m-%d')
# initialize new_df 
for i in range((End_Date - Start_Date).days+1):
    new_df.loc[i] = [season, (Start_Date+timedelta(i)).strftime('%Y-%m-%d')]

Does that do what you wanted? I wasn't sure if the zip column was an index, but it should be apparent how to insert that.

SRT HellKitty
  • 577
  • 3
  • 10
0

You will need to generate a DataFrame from each row, and then concat them together:

res = pd.concat([
    pd.DataFrame({
        'zip': r.zip, 'season': r.season, 'date': pd.DatetimeIndex(
            start=r.season_start_date, end=r.season_end_date, freq='D'
        )
    }) for _, r in data.iterrows()
], sort=False)
Alpha
  • 2,372
  • 3
  • 21
  • 23