1

I have one dataframe that contains a series of dates and a corresponding value. I have another dataframe with three columns (mindate, maxdate, value). I want to iterate through each row of the first dataframe, using the second dataframes mindate and maxdate. I then want to multiply every Value by the CustomerUsage in the new range df and return the sum.

>>df1        Date           Value
        0    2012-04-01     0.00275
        1    2012-04-02     0.00278
        2    2012-04-03     0.00369
        3    2012-04-04     0.00268
        4    2012-04-05     0.00400

>>df2       Start           End           CustomerUsage
        1   2012-04-01      2012-04-03    464.0
        2   2012-04-04      2012-04-04    472.1

>>    for  row in df2.iterrows():

           mindate = row[row.index[0],'Start']
           maxdate = row[row.index[0],'End']

           range = df1[(df1['Dates'] >= mindate) & (df1['Dates'] <= maxdate)]

           range['Calc'] = range['Value']*df2['CustomerUsage']
           ##numpy .agg function here##

A single row will work, but I am stuck on iterating through the dates, error of AttributeError: 'str' object has no attribute 'loc' (I gather I'm treating these tuples wrong, but unsure of the remedy folks!)

Many thanks!

D_Fresh
  • 11
  • 1
  • 4
  • Can your fix your data , end date is earlier than start – BENY Apr 20 '18 at 21:38
  • thanks, fixed! trying to multiply the customerusage by each value in the df1 date range. – D_Fresh Apr 20 '18 at 21:40
  • look at this question https://stackoverflow.com/questions/46525786/how-to-join-two-dataframes-for-which-column-values-are-within-a-certain-range/46526249#46526249 – BENY Apr 20 '18 at 21:43
  • shouldn't your last uncommented line read: range['Calc'] = range['Value']*row['CustomerUsage'] – Noam Apr 20 '18 at 21:47
  • @Wen well im trying to do a similar merge, but for each row of df2, ultimately the numpy .agg function referenced creates a new column. i believe my hangup is on the mindate, maxdate once they are introduced to the for loop (treating the individual tuples wrong?) The actual df2 I am working with has hundreds of thousands of rows of customerusages for all sorts of different date ranges. – D_Fresh Apr 20 '18 at 21:58

1 Answers1

0

Alrighty, so i ended up with the following answer. Definitely ran across a few @Wen posts haha

   #created an ID
   list_ = []
   df1.insert(0,'ID',range(0,0+len(df1)))
   for index, row in df1.iterrows():
       start = row['start']
       end = row['end']
       range = df2[(df2['date']>=start) & (df2['date']<=end)]
       df2['ID'] = row['ID']
       list_.append(df2)
   batch = pd.concat(list_)
   _small = batch.groupby(['ID']).agg({'value': np.sum})
   _merge = batch.reset_index().merge(_small.reset_index(), how = 'left', on = ['ID']

I am experimenting with itertuples as I'm reading its much faster. Speed is of the essence so if anyone has any upgrades... :)

D_Fresh
  • 11
  • 1
  • 4