3

I want to draw chart in my python application, but source numpy array is too large for doing this (about 1'000'000+). I want to take mean value for neighboring elements. The first idea was to do it in C++-style:

step = 19000 # every 19 seconds (for example) make new point with neam value
dt = <ordered array with time stamps>
value = <some random data that we want to draw>

index = dt - dt % step
cur = 0
res = []

while cur < len(index):
    next = cur
    while next < len(index) and index[next] == index[cur]:
        next += 1
    res.append(np.mean(value[cur:next]))
    cur = next

but this solution works very slow. I tried to do like this:

step = 19000 # every 19 seconds (for example) make new point with neam value
dt = <ordered array with time stamps>
value = <some random data that we want to draw>

index = dt - dt % step
data = np.arange(index[0], index[-1] + 1, step)
res = [value[index == i].mean() for i in data]
pass

This solution is slower than the first one. What is the best solution for this problem?

Community
  • 1
  • 1
Artem Mezhenin
  • 5,539
  • 6
  • 32
  • 51

1 Answers1

3

np.histogram can provide sums over arbitrary bins. If you have time series, e.g.:

import numpy as np

data = np.random.rand(1000)          # Random numbers between 0 and 1
t = np.cumsum(np.random.rand(1000))  # Random time series, from about 1 to 500

then you can calculate the binned sums across 5 second intervals using np.histogram:

t_bins = np.arange(0., 500., 5.)       # Or whatever range you want
sums = np.histogram(t, t_bins, weights=data)[0]

If you want the mean rather than the sum, remove the weights and use the bin tallys:

means = sums / np.histogram(t, t_bins)][0]

This method is similar to the one in this answer.

Community
  • 1
  • 1
marshall.ward
  • 6,758
  • 8
  • 35
  • 50