Compress an array in python?

Question

Is there a way to "compress" an array in python so as to keep the same range but simply decrease the number of elements to a given value?

For example I have an array with 1000 elements and I want to modify it to have 100. Specifically I have a numpy array that is

x = linspace(-1,1,1000)

But because of the way in which I am using it in my project, I can't simply recreate it using linspace as it will not always be in the domain of -1 to 1 and have 1000 elements. These parameters change and I don't have access to them in the function I am defining. So I need a way to compress the array while keeping the -1 to 1 mapping. Think of it as decreasing the "resolution" of the array. Is this possible with any built in functions or different libraries?

In your example, do you want to remove 9 out of every 10 elements to make the array of length 1000 an array of length 100? — David Greydanus, Sep 07 '14 at 19:09
@DavidGreydanus Thats essentially what I want to do, as long as the result keeps the array as a good approximation to what it was before. — Alex, Sep 07 '14 at 19:35

score 3 · Accepted Answer · edited May 23 '17 at 12:30

A simple way to "resample" your array is to group it into chunks, then average each chunk:

(Chunking function is from this answer)

#  Chunking function 
def chunks(l, n):
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

# Resampling function
def resample(arr, newLength):
    chunkSize = len(arr)/newLength
    return [np.mean(chunk) for chunk in chunks(arr, chunkSize)]

# Example:
import numpy as np
x = np.linspace(-1,1,15)
y = resample(x, 5)
print y
# Result:
# [-0.85714285714285721, -0.4285714285714286, -3.7007434154171883e-17, 0.42857142857142844, 0.8571428571428571]

As you can see, the range of the resampled array does drift inward, but this effect would be much smaller for larger arrays.

It's not clear to me whether the arrays will always be generated by numpy.linspace or not. If so, there are simpler ways of doing this, like simply picking each nth member of the original array, where n is determined by the "compression" ratio:

def linearResample(arr, newLength):
    spacing = len(arr) / newLength
    return arr[::spacing]

I can probably restrict it to the case where `numpy.linspace` is always generating the array I need to compress. what are the simpler ways in that case? — Alex, Sep 07 '14 at 19:33
Edited with simpler version for arrays that are guaranteed to be linear. — Brionius, Sep 07 '14 at 19:40

score 1 · Answer 2 · answered Sep 07 '14 at 20:25

You could pick items at random to reduce any bias you have in the reduction. If the original sample is unordered it would just be:

import random
sample = range(1000)

def reduce(sample, count):
    work = sample[:]
    random.shuffle(work)
    return work[:count]

If order matters, then use enum to track position and reassemble

def reduce(sample, count):
    indexed = [item for item in enumerate(sample)]
    random.shuffle(indexed)
    trimmed = indexed[:count]
    trimmed.sort()
    return [item for index,item in trimmed]

Compress an array in python?

2 Answers2