Reduce python list size while preserving information

Question

I have multiple long lists in my program. Each list has approximately 3000 float values. And there are around 100 such lists.

I want to reduce the size of each list to say, 500, while preserving the information in the original list. I know that it is not possible to completely preserve the information, but I would like to have the elements in the original list to have contribution to the values of the smaller list.

Let's say we have the following list, and want to shorten it to a lists of size 3 or 4.

myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
          [7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
          [4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
          [7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
          [4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]

Is there some way to do this. Maybe by averaging of some sort (?)

I would use the mean and standard deviation. Maybe also include the range (max value -min value) or even the max and min values? Maybe also the mode value (i.e. the value that occurs most). I do not think there is a rule of thumb it really depends on your goal. — ko3, Jul 14 '22 at 06:38
What kind of data is this? Would it make sense to recast the problem as a linear algebra one and use dimensionality reduction techniques like singular value decomposition (SVD)? — ndc85430, Jul 14 '22 at 06:45
@ko3. Thank you for the answer. Could you please give a working example using the list in the question? Thanks — Tree Big, Jul 14 '22 at 06:45
@ndc85430. These are image pixel values. I am not sure if SVD would be a good option for this case or not. What do you think? — Tree Big, Jul 14 '22 at 06:47
Basic process is to first resize the list to be N*M numbers, then apply a lowpass filter to make the frequency lower than the desired resolution, than sample one very M samples to create the shorter list — mousetail, Jul 14 '22 at 06:49
@mousetail. Ahh yes! That's it. Was unable to find the correct term for it. Thanks. Could you tell me how to go about resampling a list? — Tree Big, Jul 14 '22 at 06:49
You have to know what's important about these values. Is it the trend? The average? The sum? The peaks and valleys? The only way to compress data is to understand what you can do without damaging the data. There are established algorithms for compressing pixel data (like JPEG). — Tim Roberts, Jul 14 '22 at 06:49
@TimRoberts. It is the average of this data that is important for me. — Tree Big, Jul 14 '22 at 06:50
I"m not trying to be flippant, but if that's the case, you can reduce the list to one number. — Tim Roberts, Jul 14 '22 at 06:51
@TimRoberts. I understand, but I still would need a list for some further processing that I need to do. — Tree Big, Jul 14 '22 at 06:53
I recommend taking a look at [Python Imaging Library](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.resize). You can resize the image with choice among options of resampling filters — Oluwafemi Sule, Jul 14 '22 at 07:07

ko3 · Answer 1 · 2022-07-14T06:57:35.830

0

You can do something like this:

from statistics import mean, stdev

myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1], [2.3, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1]]

shorten_list = [[max(i)-min(i), mean(i), round(stdev(i), 5)] for i in myList]

You can also include information such as the sum of the list or the mode. If you just want to take the mean of each list within your list, you can just do this:

from statistics import mean

mean_list = list(map(mean, myList))

edited Jul 14 '22 at 06:57

answered Jul 14 '22 at 06:52

ko3

1,757
5
13

Thank you for your answer. Also, do you think that taking averages of a fixed number of elements in a list for achieving something like this is a good idea? – Tree Big Jul 14 '22 at 07:01
@TreeBig, well obviously any kind of reduction is error prone. I am not sure what you are trying to do, therefore I cannot give you any kind of recommendation on that. But you definitely need a starting point and a simple `mean` is one. Maybe start with the mean and if you feel like your result does not satisfy you, then include other information. – ko3 Jul 14 '22 at 07:12

gsv · Answer 2 · 2022-07-14T08:33:38.540

batching may work. I request you to look at this question

How do I split a list into equally-sized chunks?

this converts the list into equal batches.

or can sequence the dimension of the list using max pool layer

import numpy as np
from keras.models import Sequential
from keras.layers import MaxPooling2D
image = np.array([[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
          [7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
          [4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
          [7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
          [4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
)
image = image.reshape(1, 5, 10, 1)

model = Sequential([MaxPooling2D(pool_size =(1,10), strides = (1))])
output = model.predict(image)
print(output)

this gives output as

[[[[7.7]]

  [[7.4]]

  [[7.7]]

  [[7.3]]

  [[8.4]]]]

if you want to change the output size, can change the pool size.

So I can split my list into smaller lists of equal sizes, take their averages and make a new smaller list from those averages? P.S.: I have added some data to the question. — Tree Big, Jul 14 '22 at 07:10

Reduce python list size while preserving information

2 Answers2