2

I am new to pandas. I have written a function which I want to apply to all entries in a pandas DataFrame "Monthly_mean_consump" for which a sample of data is provided below.

Time    2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31 2011-01-31  2011-02-28 2011-03-31  2011-04-30 
00:00   0.429188    0.302406    0.587415    0.227820    0.115938    0.170616    0.056256    0.078167    0.476515
00:30   0.340452    0.385037    0.218495    0.238118    0.134938    0.123279    0.054984    0.108111    0.173700
01:00   0.455451    0.433700    0.229352    0.253046    0.391425    0.313715    0.401116    0.123304    0.453640
01:30   0.601494    0.576142    0.425599    0.590769    0.486930    0.419002    0.560737    0.554705    0.544313
02:00   0.504342    0.584523    0.614539    0.375294    0.377951    0.342675    0.357372    0.367332    0.391336
02:30   0.527724    0.443303    0.457283    0.369515    0.392317    0.379410    0.391916    0.444807    0.491411
03:00   0.429236    0.531203    0.464098    0.370421    0.426875    0.360209    0.276282    0.179577    0.304990
03:30   0.442019    0.510604    0.314080    0.372268    0.443884    0.461649    0.390262    0.284042    0.417354
04:00   0.359956    0.411183    0.390059    0.398598    0.397369    0.351005    0.534459    0.317548    0.309686
04:30   0.352609    0.247238    0.421061    0.385950    0.300075    0.200626    0.402874    0.126153    0.261859

I have written a function whose input is the individual element in dataframe "Monthly_mean_consump" and wherever this value is larger than 0.45, the function finds corresponding cost from another DataFrame "df_loss_peak" (a sample is provided below) and returns corresponding cost.

    Peak_Consumption    Cost
0   0.450000    1.413333
1   0.557895    4.651579
2   0.665789    7.133158
3   0.773684    9.614737
4   0.881579    12.096316
5   0.989474    14.577895
6   1.097368    17.059474
7   1.205263    19.541053
8   1.313158    22.022632
9   1.421053    24.504211
10  1.528947    26.985789
11  1.636842    29.467368
12  1.744737    31.948947
13  1.852632    34.430526
14  1.960526    36.912105
15  2.068421    39.393684
16  2.176316    41.875263
17  2.284211    44.356842
18  2.392105    46.838421
19  2.500000    49.320000

The function which I have written is following:

def cost_consump(mean_consump):
    if(mean_consump >= 0.45):
        for i in range(0, len(df_loss_peak)):
            if(df_loss_peak["Peak_Consumption"][i] > mean_consump):
                cost = df_loss_peak["Cost"][i]
                return cost        
                break;       
    return 0

I have used two nested for loops (code is below) to apply this function to each elements of the dataframe "Monthly_mean_consump" and I am getting the desired output.

Monthly_mean_cost = pd.DataFrame.copy(Monthly_mean_consump,deep=True)

for i in range(0,48):
    for j in range(0,12):
        mean_consump = Monthly_mean_consump.transpose().iloc[i,j]
        Monthly_mean_cost.transpose().iloc[i,j] = cost_consump(mean_consump)

However, I am aware that this is quite time consuming and pandas provides an efficient fuction "applymap()" which is much suited for this job. I have used the applymap() function in various other taks. But when I write the following code in order to use applymap() here, I get keyindex error -1:

Monthly_mean_consump.transpose().applymap(cost_consump)

Any help would be appreciated? Also am I right in thinking that applymap() function would be much efficient than the loop which I have implemented?

Thanks

Pankaj
  • 519
  • 2
  • 5
  • 20

2 Answers2

2

You can use bisect library.

def cost_consump(x, df_loss_peak):
    return df_loss_peak.iloc[bisect.bisect(df_loss_peak.Peak_Consumption,x)].Cost
Monthly_mean_cost.applymap(lambda x: cost_consump(x,df_loss_peak)

if you want the cost of value less than 0.45 to have a different value, you may want to add a first line at zero on your df_loss_peak dataframe.

output

    2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31  2011-01-31  2011-02-28  2011-03-31  2011-04-30
0   1.413333    1.413333    7.133158    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579
1   1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333
2   4.651579    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579
3   7.133158    7.133158    1.413333    7.133158    4.651579    1.413333    7.133158    4.651579    4.651579
4   4.651579    7.133158    7.133158    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333
5   4.651579    1.413333    4.651579    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579
6   1.413333    4.651579    4.651579    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333
7   1.413333    4.651579    1.413333    1.413333    1.413333    4.651579    1.413333    1.413333    1.413333
8   1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579    1.413333    1.413333
9   1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333

If you don't want them you can add:

Monthly_mean_cost.applymap(lambda x: cost_consump(x,df_loss_peak)[Monthly_mean_cost>0.45]

output:

    2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31  2011-01-31  2011-02-28  2011-03-31  2011-04-30
0   NaN     NaN     7.133158    NaN     NaN     NaN     NaN     NaN     4.651579
1   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   4.651579    NaN     NaN     NaN     NaN     NaN     NaN     NaN     4.651579
3   7.133158    7.133158    NaN     7.133158    4.651579    NaN     7.133158    4.651579    4.651579
4   4.651579    7.133158    7.133158    NaN     NaN     NaN     NaN     NaN     NaN
5   4.651579    NaN     4.651579    NaN     NaN     NaN     NaN     NaN 4.651579
6   NaN     4.651579    4.651579    NaN     NaN     NaN     NaN     NaN     NaN 
7   NaN     4.651579    NaN     NaN     NaN     4.651579    NaN     NaN     NaN
8   NaN     NaN     NaN     NaN     NaN     NaN     4.651579    NaN     NaN 
9   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN 
Tbaki
  • 1,013
  • 7
  • 12
1

I think you can use cut with stack for Series from DataFrame:

labels = df_loss_peak['Cost']
bins =  df_loss_peak['Peak_Consumption'].tolist() + [np.inf]
df = pd.cut(Monthly_mean_consump.stack(), bins=bins, labels=labels)

If need filter data add dropna + rename_axis + reset_index:

print (df.dropna().rename_axis(['time','date']).reset_index(name='val'))
     time        date       val
0   00:00  2010-10-31  4.651579
1   00:00  2011-04-30  1.413333
2   01:00  2010-08-31  1.413333
3   01:00  2011-04-30  1.413333
4   01:30  2010-08-31  4.651579
5   01:30  2010-09-30  4.651579
6   01:30  2010-11-30  4.651579
7   01:30  2010-12-31  1.413333
8   01:30  2011-02-28  4.651579
9   01:30  2011-03-31  1.413333
10  01:30  2011-04-30  1.413333
11  02:00  2010-08-31  1.413333
12  02:00  2010-09-30  4.651579
13  02:00  2010-10-31  4.651579
14  02:30  2010-08-31  1.413333
15  02:30  2010-10-31  1.413333
16  02:30  2011-04-30  1.413333
17  03:00  2010-09-30  1.413333
18  03:00  2010-10-31  1.413333
19  03:30  2010-09-30  1.413333
20  03:30  2011-01-31  1.413333
21  04:00  2011-02-28  1.413333

If need replace values use unstack + combine_first:

print (df.unstack())
      2010-08-31 2010-09-30 2010-10-31 2010-11-30 2010-12-31 2011-01-31  \
Time                                                                      
00:00        NaN        NaN   4.651579        NaN        NaN        NaN   
00:30        NaN        NaN        NaN        NaN        NaN        NaN   
01:00   1.413333        NaN        NaN        NaN        NaN        NaN   
01:30   4.651579   4.651579        NaN   4.651579   1.413333        NaN   
02:00   1.413333   4.651579   4.651579        NaN        NaN        NaN   
02:30   1.413333        NaN   1.413333        NaN        NaN        NaN   
03:00        NaN   1.413333   1.413333        NaN        NaN        NaN   
03:30        NaN   1.413333        NaN        NaN        NaN   1.413333   
04:00        NaN        NaN        NaN        NaN        NaN        NaN   
04:30        NaN        NaN        NaN        NaN        NaN        NaN   

      2011-02-28 2011-03-31 2011-04-30  
Time                                    
00:00        NaN        NaN   1.413333  
00:30        NaN        NaN        NaN  
01:00        NaN        NaN   1.413333  
01:30   4.651579   1.413333   1.413333  
02:00        NaN        NaN        NaN  
02:30        NaN        NaN   1.413333  
03:00        NaN        NaN        NaN  
03:30        NaN        NaN        NaN  
04:00   1.413333        NaN        NaN  
04:30        NaN        NaN        NaN

print (df.unstack().combine_first(Monthly_mean_consump))
       2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31  2011-01-31  \
Time                                                                            
00:00    0.429188    0.302406    4.651579    0.227820    0.115938    0.170616   
00:30    0.340452    0.385037    0.218495    0.238118    0.134938    0.123279   
01:00    1.413333    0.433700    0.229352    0.253046    0.391425    0.313715   
01:30    4.651579    4.651579    0.425599    4.651579    1.413333    0.419002   
02:00    1.413333    4.651579    4.651579    0.375294    0.377951    0.342675   
02:30    1.413333    0.443303    1.413333    0.369515    0.392317    0.379410   
03:00    0.429236    1.413333    1.413333    0.370421    0.426875    0.360209   
03:30    0.442019    1.413333    0.314080    0.372268    0.443884    1.413333   
04:00    0.359956    0.411183    0.390059    0.398598    0.397369    0.351005   
04:30    0.352609    0.247238    0.421061    0.385950    0.300075    0.200626   

       2011-02-28  2011-03-31  2011-04-30  
Time                                       
00:00    0.056256    0.078167    1.413333  
00:30    0.054984    0.108111    0.173700  
01:00    0.401116    0.123304    1.413333  
01:30    4.651579    1.413333    1.413333  
02:00    0.357372    0.367332    0.391336  
02:30    0.391916    0.444807    1.413333  
03:00    0.276282    0.179577    0.304990  
03:30    0.390262    0.284042    0.417354  
04:00    1.413333    0.317548    0.309686  
04:30    0.402874    0.126153    0.261859  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252