1

I'm sure there's a numpy function for this, I just haven't found it yet. Say you have a list of integers of arbitrary size. How can you convert this to a list of ten floats, where each float is the average of that tenth of the original list?

Edit: the best I can come up with is to first convert the list to a numpy array, then split it into equal parts using .array_split(array, 10), then take the average of each one. But I imagine there must be an easier way to do this.

Jonathan
  • 10,571
  • 13
  • 67
  • 103

1 Answers1

2

What you've described is literally a one-liner:

np.mean(np.array_split(array, 10), axis=0)

array_split works on any "array-like", which includes lists, so that "convert the list to a numpy array" is automatic.

np.mean also works on any "array-like", which includes the list of arrays returned by array_split, treated as a 2D array, so that "… of each one" is also automatic.

So this really is all there is to it.


Note that while "array-like" is used all over the NumPy docs, as far as I'm aware, it's never rigorously defined anywhere. But what it basically means is: if you could call np.array(x) and get back the array you'd naively hope for, then x is array-like. This question has an answer that delves into the NumPy source to show exactly how array-like values are handled.

Also notice that some NumPy functions don't actually say what types they expect. For example, mean explicitly says it takes an array_like for a, but array_split doesn't say what it takes (and if you tried to guess by following the link to split, you'd guess wrong). So, sometimes, the only way to find out of an array-like is acceptable is to test it. But this is a trivial test, so that's not a big deal.


Of course if you don't want to use NumPy, you don't have to; it's not that much harder to do directly on lists; you just need explicit loops:

chunksize = len(array) // 10
chunks = (array[i*chunksize:(i+1)*chunksize] for i in range(10))
means = [statistics.mean(chunk) for chunk in chunks]

And of course you can collapse that all into a horrible one-liner if you want to—or, better, wrap the first two lines up in a function that does the same thing as array_split, so you can then just do:

means = [statistics.mean(chunk) for chunk in my_array_split(array, 10)]

… which doesn't look all that different from the NumPy one-liner, it just makes the loop explicit.

abarnert
  • 354,177
  • 51
  • 601
  • 671