Pandas : finding the average of values within a particular limit in a column

Question

I have imported the first three columns of a .csv file named as Time, Test 1 and Test 2 in my python program.

import pandas as pd
fields = ['Time', 'Time 1', 'Time 2']
df=pd.read_csv('file.csv', skipinitialspace=True, usecols=fields)

Here is the file which I imported in the program.

How can I make a function which finds the mean/average of the values in the Test 1 column between a given time limit? The time limits (starting and end values) are to be taken as the parameters in the function.

e.g., I want to find the average of the values in the column Test 1 from 0.50 seconds to 4.88 seconds. The limits (0.50 and 4.88) would be the function's parameter.

Welcome to SO. In future, please provide data as text, *not* as an image, *not* as a link. Thank you! — jpp, Mar 19 '18 at 11:39

jezrael · Accepted Answer · 2018-03-19T11:36:58.757

3

I think need between for boolen mask, filter by boolean indexing and get mean:

def custom_mean(x,y):
    return df.loc[df['Time'].between(x,y), 'Test 1'].mean()

Sample:

df = pd.DataFrame({'Time':[0.0, 0.25, 0.5, 0.68, 0.94, 1.25, 1.65, 1.88, 2.05, 2.98, 3.45, 3.99, 4.06, 4.68, 4.88, 5.06, 6.0],
                   'Test 1':np.random.randint(10, size=17)})

print (df)
    Test 1  Time
0        3  0.00
1        6  0.25
2        5  0.50
3        4  0.68
4        8  0.94
5        9  1.25
6        1  1.65
7        7  1.88
8        9  2.05
9        6  2.98
10       8  3.45
11       0  3.99
12       5  4.06
13       0  4.68
14       9  4.88
15       6  5.06
16       2  6.00

def custom_mean(x,y):
    return df.loc[df['Time'].between(x,y), 'Test 1'].mean()

print (custom_mean(0.50, 1.0))
5.666666666666667

#verify
print (df.loc[df['Time'].between(0.50, 1.0), 'Test 1'])
2    5
3    4
4    8
Name: Test 1, dtype: int32

edited Mar 19 '18 at 11:36

answered Mar 19 '18 at 11:25

jezrael

822,522
95
1,334
1,252

When you entered 1.0 as the end limit for the function, did it automatically take the value corresponding to 0.94 seconds? (since 1.0 is not in the column anywhere) – Arpit Sharma Mar 19 '18 at 11:44
Yes, exactly. Check bottom of answer, it explain what is exactly seelcted. – jezrael Mar 19 '18 at 11:45
Thanks @jezrael. Also, if the starting limit is not present in the column, is it going to take the value corresponding to the time just greater than that? For instance, if I enter starting limit as 0.6, will it take the value corresponding to 0.68? – Arpit Sharma Mar 19 '18 at 11:51
Yes, sure, you can check it by `print (df.loc[df['Time'].between(0.6, 1.0)])` – jezrael Mar 19 '18 at 11:52
Thanks for the help@jezrael. I'm quite new to Python – Arpit Sharma Mar 19 '18 at 11:53
@ArpitSharma - No problem, evry coder starts coding :) – jezrael Mar 19 '18 at 11:54
@ArpitSharma - Some problem? It seems answer was not accepted. – jezrael Mar 19 '18 at 11:56
@ArpitSharma - Thank you. Btw, small advice - in next question is better created small data sample, check [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jezrael Mar 19 '18 at 11:59
1

I'll take note of that. Thank you. – Arpit Sharma Mar 19 '18 at 12:04

score 0 · Answer 2 · answered Mar 19 '18 at 14:38

0

You can use the between mask and mean, std function from numpy library.
For example: this line of code will estimate the mean of the Test 1 while it was taken between time 0.0 and 5.0:

np.mean(df[df['Time'].between(0.0, 5.0)]['Test 1'])

answered Mar 19 '18 at 14:38

N.Hung

154
6

Pandas : finding the average of values within a particular limit in a column

2 Answers2