2

I have two list of data one is with date and average values, another one is with index and another value, they look like:

#list 1: (in total 62 raws)
date      d_mean
2017-3-1  15.3
2017-3-2  16.9
2017-3-3  18.2
...
2017-4-30 17.7
2017-5-1  15.6

#list2: (in total 10 raws)
     sum
121  555
122  784
123  546
...
142  568
143  658
144  847

I want to:

  1. group the dates into weeks
  2. calculate the average values for every week and create a new list with column 'week' and 'w_mean'
  3. make this two list same size then plot them against each other

I tried to use

    chunks = [avg_T1[x:x+7] for x in range(4, len(avg_T1), 7)]

but this just divided the list, I tried to add

    .mean()

in the end, doesn't really work.

I'm new to Python, I'm also glad to hear if there are any necessary materials that I need to read or practice I have to do to help me gut used to using Python.

halfer
  • 19,824
  • 17
  • 99
  • 186
Jenny
  • 169
  • 1
  • 3
  • 10
  • Do you have a copy-pastable example of your data? – Eric Duminil Oct 19 '17 at 13:05
  • Using pandas convert the date column to a datetime index and [resample](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html) to week frequency, take the mean. – Ilja Everilä Oct 19 '17 at 13:30

2 Answers2

3

I'm not sure I fully understand your question, but the answer to part one should be

df = df.set_index('date')
df['w_mean'] = df.resample('w').d_mean.mean(). 

This averages over each week of the dataframe (what I think you are calling lists).

I don't know how you'd make the two lists the same length since they are at different frequencies over the same time period.

What you will have is a column ('w_mean') that has a value every seven days and nans everywhere else.

you can do df.interpolate().plot() is the closest thing I can think of...

RSHAP
  • 2,337
  • 3
  • 28
  • 39
2

It's hard to answer without the original data, but you could use pandas.DataFrame.groupby with a custom function : either index / 7 or using strftime('%V') to get the week number:

import pandas as pd
import datetime
import numpy as np
start = datetime.datetime(2017,3,1)
end = datetime.datetime(2017,5,2)
daterange = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
data = pd.DataFrame({'date':daterange, 'v':np.random.randint(0,100,size=len(daterange))}).set_index('date')

data now looks like:

             v
date          
2017-03-01   1
2017-03-02  40
2017-03-03  49
2017-03-04  27
2017-03-05  37
2017-03-06  90
2017-03-07  88
2017-03-08  42
2017-03-09  90
2017-03-10  84
2017-03-11  47
2017-03-12  24
2017-03-13  18
2017-03-14  24
2017-03-15  38
2017-03-16  71
2017-03-17  11
2017-03-18  40
...

Here's a result with strftime:

>>> data.groupby(lambda d: d.strftime('%V')).mean()
            v
09  30.800000
10  66.428571
11  36.285714
12  40.857143
13  55.428571
14  56.285714
15  52.000000
16  39.714286
17  39.714286

Here's a result with day of year / 7:

>>> data.groupby(lambda d: d.timetuple().tm_yday//7).mean()
            v
8   30.000000
9   65.428571
10  33.285714
11  48.857143
12  46.571429
13  58.000000
14  47.428571
15  41.428571
16  45.285714
17  33.500000

It depends how you define a week.

Note that it wouldn't work if data spans more than a year. You probably should use resample, as mentioned by @RSHAP.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124