Pandas iterating in range. Same number twice?

Question

I've written this code and my output is not quite as expected. It seems that the for loop runs through the first iteration twice and then misses out the second and jumps straight to the third. I cannot see where I have gone wrong however so could someone point out the error? Thank you!

Code below:

i = 0
df_int = df1[(df1.sLap > df_z.Entry[i]) & (df1.sLap < df_z.Exit[i]) & (df1.NLap == Lap1)]

df_Entry = df_int.groupby(df_int.BCornerEntry).aggregate([np.mean, np.std])


df_Entry.rename(index={1:  'T'+str(df_z['Turn Number'][i])}, inplace=True)

for i in range(len(df_z)):
    df_int = df1[(df1.sLap > df_z.Entry[i]) & (df1.sLap < df_z.Exit[i]) & (df1.NLap == Lap1)]

    df_Entry2 = df_int.groupby(df_int.BCornerEntry).aggregate([np.mean, np.std])

    df_Entry2.rename(index={1:  'T'+str(df_z['Turn Number'][i])}, inplace=True)

    df_Entry = pd.concat([df_Entry, df_Entry2])

df_z is an excel document with data like this:

    Turn Number  Entry  Exit
0             1    321   441
1             2    893  1033
2             3   1071  1184
3             4   1234  1352
4             5   2354  2454
5             6   2464  2554
6             7   2574  2689
7             8   2955  3120..... and so on

Then df1 is a massive DataFrame with 30 columns and 10's of thousands of rows (hence the mean and std).

My Output should be:

                tLap            
                mean       std      
BCornerEntry                                                          
T1              6.845490  0.591227      
T2             14.515195  0.541967      
T3             19.598690  0.319181     
T4             21.555500  0.246757     
T5             34.980000  0.518170     
T6             37.245000  0.209284     
T7             40.220541  0.322800.... and so on

However I get this:

                tLap            
                mean       std      
BCornerEntry                                                          
T1              6.845490  0.591227      
T1              6.845490  0.591227      
T3             19.598690  0.319181     
T4             21.555500  0.246757     
T5             34.980000  0.518170     
T6             37.245000  0.209284     
T7             40.220541  0.322800..... and so on

T2 is still T1 and the numbers are the same? What have I done wrong? Any help would be greatly appreciated!

This is by design as stated in the dup [question](http://stackoverflow.com/questions/21390035/python-pandas-groupby-object-apply-method-duplicates-first-group) — EdChum, Sep 17 '15 at 10:49
@Anand S Kumar did you really just edit it to take off the 'Thank You' at the end? — OParker, Sep 17 '15 at 10:55
Take a look at - http://meta.stackexchange.com/questions/2950/should-hi-thanks-taglines-and-salutations-be-removed-from-posts — Anand S Kumar, Sep 17 '15 at 10:57
@AnandSKumar I was brought up to say 'Thank You' when people give you free advice... — OParker, Sep 17 '15 at 11:01
@EdChum This not a duplicate, the OP does not even use apply here. — joris, Sep 17 '15 at 11:20
@OParker Try using `for i in range(1, len(df_z)):`, as range starts with 0 and the i=0 case is already done before the for loop. — joris, Sep 17 '15 at 11:22
@joris Thank you, it has removed the duplicate, however T2 is still missing? There's also 1 less T than there should be at the end. I.E it finishes on T18 and there should be a T19 as well. Any ideas? — OParker, Sep 17 '15 at 11:28
@OParker That is difficult to say unless you provide a reproducible example. But look at `len(df_z)`, the result will have the same number of rows. So if that is wrong, there is something wrong in the logic. — joris, Sep 17 '15 at 11:37
Thanks @joris I'll look into it. Might be an issue with the data :/ — OParker, Sep 17 '15 at 12:27
@joris It was an issue with the data but you still answered the first part of my question so thank you! If you put an answer below I'll mark it as the correct version so you get the points you deserve! — OParker, Sep 17 '15 at 13:02

score 2 · Accepted Answer · answered Sep 17 '15 at 13:21

2

Instead of range(len(df_z), try using:

for i in range(1, len(df_z)):
    ...

as range starts with 0 and the i=0 case is already done before the for loop (so for this reason it is included twice).

answered Sep 17 '15 at 13:21

joris

133,120
36
247
202

Pandas iterating in range. Same number twice?

1 Answers1