1

I have imported the first three columns of a .csv file named as Time, Test 1 and Test 2 in my python program.

import pandas as pd
fields = ['Time', 'Time 1', 'Time 2']
df=pd.read_csv('file.csv', skipinitialspace=True, usecols=fields)

Here is the file which I imported in the program.

enter image description here

How can I make a function which finds the mean/average of the values in the Test 1 column between a given time limit? The time limits (starting and end values) are to be taken as the parameters in the function.

e.g., I want to find the average of the values in the column Test 1 from 0.50 seconds to 4.88 seconds. The limits (0.50 and 4.88) would be the function's parameter.

Arpit Sharma
  • 345
  • 6
  • 15

2 Answers2

3

I think need between for boolen mask, filter by boolean indexing and get mean:

def custom_mean(x,y):
    return df.loc[df['Time'].between(x,y), 'Test 1'].mean()

Sample:

df = pd.DataFrame({'Time':[0.0, 0.25, 0.5, 0.68, 0.94, 1.25, 1.65, 1.88, 2.05, 2.98, 3.45, 3.99, 4.06, 4.68, 4.88, 5.06, 6.0],
                   'Test 1':np.random.randint(10, size=17)})

print (df)
    Test 1  Time
0        3  0.00
1        6  0.25
2        5  0.50
3        4  0.68
4        8  0.94
5        9  1.25
6        1  1.65
7        7  1.88
8        9  2.05
9        6  2.98
10       8  3.45
11       0  3.99
12       5  4.06
13       0  4.68
14       9  4.88
15       6  5.06
16       2  6.00

def custom_mean(x,y):
    return df.loc[df['Time'].between(x,y), 'Test 1'].mean()

print (custom_mean(0.50, 1.0))
5.666666666666667

#verify
print (df.loc[df['Time'].between(0.50, 1.0), 'Test 1'])
2    5
3    4
4    8
Name: Test 1, dtype: int32
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • When you entered 1.0 as the end limit for the function, did it automatically take the value corresponding to 0.94 seconds? (since 1.0 is not in the column anywhere) – Arpit Sharma Mar 19 '18 at 11:44
  • Yes, exactly. Check bottom of answer, it explain what is exactly seelcted. – jezrael Mar 19 '18 at 11:45
  • Thanks @jezrael. Also, if the starting limit is not present in the column, is it going to take the value corresponding to the time just greater than that? For instance, if I enter starting limit as 0.6, will it take the value corresponding to 0.68? – Arpit Sharma Mar 19 '18 at 11:51
  • Yes, sure, you can check it by `print (df.loc[df['Time'].between(0.6, 1.0)])` – jezrael Mar 19 '18 at 11:52
  • Thanks for the help@jezrael. I'm quite new to Python – Arpit Sharma Mar 19 '18 at 11:53
  • @ArpitSharma - No problem, evry coder starts coding :) – jezrael Mar 19 '18 at 11:54
  • @ArpitSharma - Some problem? It seems answer was not accepted. – jezrael Mar 19 '18 at 11:56
  • @ArpitSharma - Thank you. Btw, small advice - in next question is better created small data sample, check [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jezrael Mar 19 '18 at 11:59
  • 1
    I'll take note of that. Thank you. – Arpit Sharma Mar 19 '18 at 12:04
0

You can use the between mask and mean, std function from numpy library.
For example: this line of code will estimate the mean of the Test 1 while it was taken between time 0.0 and 5.0:

np.mean(df[df['Time'].between(0.0, 5.0)]['Test 1'])
N.Hung
  • 154
  • 6