I can't figure out a problem I am trying to solve. I have a pandas data frame coming from this:
date, id, measure, result
2016-07-11, 31, "[2, 5, 3, 3]", 1
2016-07-12, 32, "[3, 5, 3, 3]", 1
2016-07-13, 33, "[2, 1, 2, 2]", 1
2016-07-14, 34, "[2, 6, 3, 3]", 1
2016-07-15, 35, "[39, 31, 73, 34]", 0
2016-07-16, 36, "[3, 2, 3, 3]", 1
2016-07-17, 37, "[3, 8, 3, 3]", 1
Measurements column consists of arrays in string format.
I want to have a new moving-average-array
column from the past 3 measurement records, excluding those records where the result
is 0. Past 3 records mean that for id
34, the arrays of id
31,32,33 to be used.
It is about taking the average of every 1st point, 2nd point, 3rd and 4th point to have this moving-average-array
.
It is not about getting the average of 1st array, 2nd array ... and then averaging the average, no.
For the first 3 rows, because there is not enough history, I just want to use their own measurement. So the solution should look like this:
date, id, measure, result . Solution
2016-07-11, 31, "[2, 5, 3, 3]", 1, "[2, 5, 3, 3]"
2016-07-12, 32, "[3, 5, 3, 3]", 1, "[3, 5, 3, 3]"
2016-07-13, 33, "[2, 1, 2, 2]", 1, "[2, 1, 2, 2]"
2016-07-14, 34, "[2, 6, 3, 3]", 1, "[2.3, 3.6, 2.6, 2.6]"
2016-07-15, 35, "[39, 31, 73, 34]", 0, "[2.3, 4, 2.6, 2.6]"
2016-07-16, 36, "[3, 2, 3, 3]", 1, "[2.3, 4, 2.6, 2.6]"
2016-07-17, 37, "[3, 8, 3, 3]", 1, "[2.3, 3, 2.6, 2.6]"
The real data is bigger. result
0 may repeat 2 or more times after each other also. I think it will be about keeping a track of previous OK result
s properly getting those averages. I spent time but I could not.
I am posting the dataframe here:
mydict = {'date': {0: '2016-07-11',
1: '2016-07-12',
2: '2016-07-13',
3: '2016-07-14',
4: '2016-07-15',
5: '2016-07-16',
6: '2016-07-17'},
'id': {0: 31, 1: 32, 2: 33, 3: 34, 4: 35, 5: 36, 6: 37},
'measure': {0: '[2, 5, 3, 3]',
1: '[3, 5, 3, 3]',
2: '[2, 1, 2, 2]',
3: '[2, 6, 3, 3]',
4: '[39, 31, 73, 34]',
5: '[3, 2, 3, 3]',
6: '[3, 8, 3, 3]'},
'result': {0: 1, 1: 1, 2: 1, 3: 1, 4: 0, 5: 1, 6: 1}}
df = pd.DataFrame(mydict)
Thank you for giving directions or pointing out how to.