-1

I have large data frame in pandas which has two columns Time and Values. I want to calculate consecutive averages for values in column Values based on the condition which is formed from the column Time. I want to calculate average of the first l values in column Values, then next l values from the same column and so on, till the end of the data frame. The value l is the number of values that go into every average and it is determined by the time difference in column Time. Starting data frame looks like this

Time   Values
t1     v1
t2     v2
t3     v3
...    ...
tk     vk

For example, average needs to be taken at every 2 seconds and the number of time values inside that time difference will determine the number of values l for which the average will be calculated. a1 would be the first average of l values, a2 next, and so on.

Second part of the question is the same calculation of averages, but if the number l is known in advance. I tried this

 df['Time'].iloc[0:l].mean()

which works for the first l values.

In addition, I would need to store the average values in another data frame with columns Time and Averages for plotting using matplotlib.

How can I use pandas to achieve my goal?

I have tried the following

df = pd.DataFrame({'Time': [1595006371.756430732,1595006372.502789381 ,1595006373.784446912 ,1595006375.476658051], 'Values': [4,5,6,10]},index=list('abcd'))

I get

   Time                     Values
a  1595006371.756430732       4   
b  1595006372.502789381       5  
c  1595006373.784446912       6   
d  1595006375.476658051      10  

Time is in the format seconds.milliseconds.

If I expect to have the same number of values in every 2 seconds till the end of the data frame, I can use the following loop to calculate value of l:

s=1
l=0
while df['Time'][s] - df['Time'][0] <= 2:
    s+=1
    l+=1

Could this be done differently, without the loop? How can I do this if number l is not expected to be the same inside each averaging interval?

For the given l, I want to calculate average values of l elements in another column, for example column Values, and to populate column Averages of data frame df1 with these values. I tried with the following code

p=0
df1=pd.DataFrame(columns=['Time','Averages']
for w in range (0, len(df)-1,2):
    df1['Averages'][p]=df['Values'].iloc[w:w+2].mean()
    p=p+1

Is there any other way to calculate these averages?

To clarify a bit more. I have two columns Time and Values. I want to determine how many consecutive values from the column Values should be averaged at one point. I do that by determining this number l from the column Time by calculating how many rows are inside the time difference of 2 seconds. When I determined that value, for example 2, then I average first two values from the column Values, and then next 2, and so on till the end of the data frame. At the end, I store this value in the separate column of another data frame.

I would appreciate your assistance.

Topaz
  • 1
  • 1
  • 1
    It would be good to test the code if you could create a minimum reproducible example: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jul 31 '20 at 19:48

1 Answers1

0

You talk about Time and Value and then groups of consecutive rows.

If you want to group by consecutive rows and get the mean of the Time and Value this does it for you. You really need to show by example what you are really trying to achieve.

d = list(pd.date_range(dt.datetime(2020,7,1), dt.datetime(2020,7,2), freq="15min"))
df = pd.DataFrame({"Time":d, 
      "Value":[round(random.uniform(0, 1),6) for x in d]})

df

n = 5
df.assign(grp=df.index//5).groupby("grp").agg({"Time":lambda s: s.mean(),"Value":"mean"})

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • In my question, df.dtypes shows float64 for the Time column. How can I use float64 values inside dt.datetime? – Topaz Aug 01 '20 at 14:06
  • I tried the last line for n=2: df.assign(grp=df.index//2).groupby("grp").agg({"Time":lambda s: s.mean(),"Values":"mean"}) I get an error raise TypeError(f"cannot perform {name} with this index type: {typ}") TypeError: cannot perform __floordiv__ with this index type: Index – Topaz Aug 01 '20 at 14:07
  • I tried df.assign(grp=df.index).groupby("grp").agg({"Time":lambda s: s.mean(),"Values":"mean"}) and this produces no error. However, it calculate averages for indices of the same values. In my case, all indices are unique. So, I used df.assign(grp=5).groupby("grp").agg({"Time":lambda s: s.mean(),"Values":"mean"}) and it showed no errors. Could you explain what is the meaning of grp=5 since the same result is obtained also for other values (grep=4, ..., grp=1) – Topaz Aug 01 '20 at 14:40
  • `assign(grp=5)` will add a column to df where every row has value 5 as 5 is a constant .... So it's all rows – Rob Raymond Aug 01 '20 at 15:25