0

I've written this code and my output is not quite as expected. It seems that the for loop runs through the first iteration twice and then misses out the second and jumps straight to the third. I cannot see where I have gone wrong however so could someone point out the error? Thank you!

Code below:

i = 0
df_int = df1[(df1.sLap > df_z.Entry[i]) & (df1.sLap < df_z.Exit[i]) & (df1.NLap == Lap1)]

df_Entry = df_int.groupby(df_int.BCornerEntry).aggregate([np.mean, np.std])


df_Entry.rename(index={1:  'T'+str(df_z['Turn Number'][i])}, inplace=True)

for i in range(len(df_z)):
    df_int = df1[(df1.sLap > df_z.Entry[i]) & (df1.sLap < df_z.Exit[i]) & (df1.NLap == Lap1)]

    df_Entry2 = df_int.groupby(df_int.BCornerEntry).aggregate([np.mean, np.std])

    df_Entry2.rename(index={1:  'T'+str(df_z['Turn Number'][i])}, inplace=True)

    df_Entry = pd.concat([df_Entry, df_Entry2])

df_z is an excel document with data like this:

    Turn Number  Entry  Exit
0             1    321   441
1             2    893  1033
2             3   1071  1184
3             4   1234  1352
4             5   2354  2454
5             6   2464  2554
6             7   2574  2689
7             8   2955  3120..... and so on

Then df1 is a massive DataFrame with 30 columns and 10's of thousands of rows (hence the mean and std).

My Output should be:

                tLap            
                mean       std      
BCornerEntry                                                          
T1              6.845490  0.591227      
T2             14.515195  0.541967      
T3             19.598690  0.319181     
T4             21.555500  0.246757     
T5             34.980000  0.518170     
T6             37.245000  0.209284     
T7             40.220541  0.322800.... and so on     

However I get this:

                tLap            
                mean       std      
BCornerEntry                                                          
T1              6.845490  0.591227      
T1              6.845490  0.591227      
T3             19.598690  0.319181     
T4             21.555500  0.246757     
T5             34.980000  0.518170     
T6             37.245000  0.209284     
T7             40.220541  0.322800..... and so on    

T2 is still T1 and the numbers are the same? What have I done wrong? Any help would be greatly appreciated!

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
OParker
  • 265
  • 1
  • 5
  • 19
  • This is by design as stated in the dup [question](http://stackoverflow.com/questions/21390035/python-pandas-groupby-object-apply-method-duplicates-first-group) – EdChum Sep 17 '15 at 10:49
  • @Anand S Kumar did you really just edit it to take off the 'Thank You' at the end? – OParker Sep 17 '15 at 10:55
  • Take a look at - http://meta.stackexchange.com/questions/2950/should-hi-thanks-taglines-and-salutations-be-removed-from-posts – Anand S Kumar Sep 17 '15 at 10:57
  • @AnandSKumar I was brought up to say 'Thank You' when people give you free advice... – OParker Sep 17 '15 at 11:01
  • @EdChum This not a duplicate, the OP does not even use apply here. – joris Sep 17 '15 at 11:20
  • @OParker Try using `for i in range(1, len(df_z)):`, as range starts with 0 and the i=0 case is already done before the for loop. – joris Sep 17 '15 at 11:22
  • @joris that's true will retract the dup vote – EdChum Sep 17 '15 at 11:22
  • @joris Thank you, it has removed the duplicate, however T2 is still missing? There's also 1 less T than there should be at the end. I.E it finishes on T18 and there should be a T19 as well. Any ideas? – OParker Sep 17 '15 at 11:28
  • @OParker That is difficult to say unless you provide a reproducible example. But look at `len(df_z)`, the result will have the same number of rows. So if that is wrong, there is something wrong in the logic. – joris Sep 17 '15 at 11:37
  • Thanks @joris I'll look into it. Might be an issue with the data :/ – OParker Sep 17 '15 at 12:27
  • @joris It was an issue with the data but you still answered the first part of my question so thank you! If you put an answer below I'll mark it as the correct version so you get the points you deserve! – OParker Sep 17 '15 at 13:02

1 Answers1

2

Instead of range(len(df_z), try using:

for i in range(1, len(df_z)):
    ...

as range starts with 0 and the i=0 case is already done before the for loop (so for this reason it is included twice).

joris
  • 133,120
  • 36
  • 247
  • 202