Pandas: Iterate over dataframe rows using date filter

Question

I have one dataframe that contains a series of dates and a corresponding value. I have another dataframe with three columns (mindate, maxdate, value). I want to iterate through each row of the first dataframe, using the second dataframes mindate and maxdate. I then want to multiply every Value by the CustomerUsage in the new range df and return the sum.

>>df1        Date           Value
        0    2012-04-01     0.00275
        1    2012-04-02     0.00278
        2    2012-04-03     0.00369
        3    2012-04-04     0.00268
        4    2012-04-05     0.00400

>>df2       Start           End           CustomerUsage
        1   2012-04-01      2012-04-03    464.0
        2   2012-04-04      2012-04-04    472.1

>>    for  row in df2.iterrows():

           mindate = row[row.index[0],'Start']
           maxdate = row[row.index[0],'End']

           range = df1[(df1['Dates'] >= mindate) & (df1['Dates'] <= maxdate)]

           range['Calc'] = range['Value']*df2['CustomerUsage']
           ##numpy .agg function here##

A single row will work, but I am stuck on iterating through the dates, error of AttributeError: 'str' object has no attribute 'loc' (I gather I'm treating these tuples wrong, but unsure of the remedy folks!)

Many thanks!

thanks, fixed! trying to multiply the customerusage by each value in the df1 date range. — D_Fresh, Apr 20 '18 at 21:40
look at this question https://stackoverflow.com/questions/46525786/how-to-join-two-dataframes-for-which-column-values-are-within-a-certain-range/46526249#46526249 — BENY, Apr 20 '18 at 21:43
shouldn't your last uncommented line read: range['Calc'] = range['Value']*row['CustomerUsage'] — Noam, Apr 20 '18 at 21:47
@Wen well im trying to do a similar merge, but for each row of df2, ultimately the numpy .agg function referenced creates a new column. i believe my hangup is on the mindate, maxdate once they are introduced to the for loop (treating the individual tuples wrong?) The actual df2 I am working with has hundreds of thousands of rows of customerusages for all sorts of different date ranges. — D_Fresh, Apr 20 '18 at 21:58

score 0 · Answer 1 · answered Apr 23 '18 at 21:24

Alrighty, so i ended up with the following answer. Definitely ran across a few @Wen posts haha

   #created an ID
   list_ = []
   df1.insert(0,'ID',range(0,0+len(df1)))
   for index, row in df1.iterrows():
       start = row['start']
       end = row['end']
       range = df2[(df2['date']>=start) & (df2['date']<=end)]
       df2['ID'] = row['ID']
       list_.append(df2)
   batch = pd.concat(list_)
   _small = batch.groupby(['ID']).agg({'value': np.sum})
   _merge = batch.reset_index().merge(_small.reset_index(), how = 'left', on = ['ID']

I am experimenting with itertuples as I'm reading its much faster. Speed is of the essence so if anyone has any upgrades... :)

Pandas: Iterate over dataframe rows using date filter

1 Answers1