1

My dataframe contains three different replications for each treatment. I want to loop through both, so I want to loop through each treatment, and for each treatment calculate a model for each replication. I managed to loop through the treatments, but I need to also loop through the replications of each treatment. Ideally, the output should be saved into a new dataframe that contains 'treatment' and 'replication'. Any suggestion?

The dataframe (df) looks like this:

 treatment replication time  y
  **8          1          1   0.1**
  8          1          2   0.1 
  8          1          3   0.1
  **8          2          1   0.1**
  8          2          2   0.1 
  8          2          3   0.1
  **10         1          1   0.1**
  10         1          2   0.1 
  10         1          3   0.1
  **10         2          1   0.1**
  10         2          2   0.1 
  10         2          3   0.1

for i, g in df.groupby('treament'):
   k = g.iloc[0].y                                   
   popt, pcov = curve_fit(model, x, y)
   fit_m = popt  
   

I now apply iterrows, but then I can no longer use the index of NPQ [0] to get the initial value. Any idea how to solve this? The error message reads as:

for index, row in HL.iterrows():
  g = (index, row['filename'], row['hr'], row['time'], row['NPQ'])
  k = g.iloc[0]['NPQ'])

AttributeError: 'tuple' object has no attribute 'iloc'

Thank you in advance

  • `df.groupby(['treatment', 'replication'])` – Kenan Jan 22 '21 at 21:07
  • 1
    It's possible to do it without looping, hence improving the time efficiency of your code. We just need to know how do you define `x` and `y` (the arguments of `curve_fit`) – Ralubrusto Jan 22 '21 at 21:11
  • 2
    Keep this in mind, in general, with pandas, trying to solve a problem with a loop is the incorrect implementation. See [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/q/16476924/7758804) & [Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects](https://realpython.com/fast-flexible-pandas/) – Trenton McKinney Jan 22 '21 at 22:15
  • @Ralubrusto, I define x= time, and y= y in dataframe. Thank you in advance – Martina Lazzarin Jan 23 '21 at 18:34
  • @TrentonMcKinney please see my update in the question: I used iterrows, but then I cannot make the previous code work. Any advice? Thank you! – Martina Lazzarin Jan 23 '21 at 19:38
  • Please show what your expect output is, given the sample dataframe. – Trenton McKinney Jan 23 '21 at 20:03

1 Answers1

0
grouped_df = HL.groupby(["hr", "filename"])

for key, g in grouped_df:
   k = g.iloc[0].y                                   
   popt, pcov = curve_fit(model, x, y)
   fit_m = popt