2

I need some help in binning my data values. Need a histogram-like function, but I don't want to list the occurrences, just the sum of the values for each bin.

In my example below I have a list with the number of Twitter followers for 30 days. Lets say I want 10 bins, then each bin would take the values of 30 / 10 = 3 days. For the first three days the value for bin 1 would be 1391 + 142 + 0 = 1533 for bin 2 12618, etc., up to bin 10.

The number of bins as well as the duration could eventually be varied. It also needs to work for a duration of 31 days and 5 bins, for instance.

Anyone knows how to do this efficiently? Is there a Python function available that could do this? Otherwise an implementation of a for loop that is able to sum n number of values in a list together until end of duration.

All help would be highly appreciated :) Thanks!

    followersList = [1391, 142, 0, 0, 12618, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 456, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

    duration = 30
    bins = 10
    binWidth = round(duration / bins)

    #
    # for loop or python function that sums values for each bin
    #
Maurice Stam
  • 79
  • 3
  • 8

3 Answers3

1

You can do it like this:

bin_width = int(round(duration / bins))
followers = [sum(followersList[i:i+bin_width]) for i in xrange(0, duration, bin_width)]
Eugene Soldatov
  • 9,755
  • 2
  • 35
  • 43
  • Thanks, great answer! Don't have that much experience with list comprehensions. Now I'm able to feed the Twitter features to my machine learning models again :) – Maurice Stam Nov 11 '15 at 13:30
  • Is there no simplet solution to the problem provided by numpy, scipy or similar? – NeStack Nov 04 '21 at 20:16
0

Another way of doing is by reshape and sum. I know that you already have a valid answer but you need to practice a lot with numpy list operations

import numpy

# this works when the list divides exactly into bins
followersList = [1391, 142, 0, 0, 12618, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 456, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
duration = len(followersList)
bins = 10
binWidth = round(duration / bins)
print(numpy.array(followersList).reshape(bins, binWidth).sum(axis=1))

# otherwhise we have to pad with zero till its a multiple of containers
followersList = [1391, 142, 0, 0, 12618, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 456, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
binWidth = 3
bins = (len(followersList) - 1) // binWidth + 1  # ceiling division
print(
    numpy.pad(followersList, (0, bins * binWidth - len(followersList)), 'constant').reshape(bins, binWidth).sum(axis=1))
user237329
  • 809
  • 1
  • 10
  • 27
0

I encountered the same problem. I thought that there should be some function provided by numpy or scipy to do this, but I couldn't find one. The closest I came to is this:

bins = 10
sum_of_bins = [np.sum(arr) for arr in np.array_split(followersList, bins)]

It uses the function np.array_split for splitting the large array into smaller arrays on which the summation is to be applied. You could also use np.split, but the latter will throw an error if followerlist can't be exactly divided by bins.

NeStack
  • 1,739
  • 1
  • 20
  • 40