0

I am totally lost in terms of efficiently binning data in real time. What i am trying to do is assign a given value in to a dictionary(or some other structure if there is one thats more efficient).

For example, if I know that the date ranges between 0 - 100 (or some other customized bounds) and I have ten bins so bin 1 includes 0 to 10, etc, what would be the best implementation so that I can simply drop the value in to the data structure and it will automatically know here to put it?

I've looked at here but this is when you have all the data together, not when its coming in in real time.

My current design is simple to loop and identify which basket it belongs to but that is so slow when I have lots of incoming data points for iteration that have 100k loops.

Community
  • 1
  • 1
user1234440
  • 22,521
  • 18
  • 61
  • 103
  • the data structure you're looking for is a [k-d tree](http://en.wikipedia.org/wiki/K-d_tree). (AKA a *binary search tree* if you only have one dimension) – roippi Apr 12 '15 at 19:43

2 Answers2

2

I think bisect may be what you want, this is based on the example in the docs:

from bisect import bisect

d = {"A": 0, "B": 0, "C": 0, "D": 0, "E": 0, "F": 0}


def grade(score, breakpoints=[70, 80, 90, 100], grades='FBCDA'):
    i = bisect(breakpoints, score)
    return grades[i]


for n in [66, 67, 77, 88, 80, 90, 91,100]:
    d[grade(n)] += n
print(d)
{'A': 100, 'C': 168, 'B': 77, 'E': 0, 'D': 181, 'F': 145}
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
1

I wrote this as saying that bin 0 = [ Min : (Max-Min)/Nbins)

class bins():
    def __init__(self,Min,Max,Nbins):
        self.bins = {}
        self.Min=float(Min)
        self.Max=float(Max)
        self.Nbins=Nbins
        for k in range(0,Nbins):
            self.bins[k]=[]
    def AddToMap(self,n):
        if(n<self.Min or n>=self.Max):
            print("Object out of map range. [ "+str(n)+" ]")
        else:
            k = int((n-self.Min)/((self.Max-self.Min)/float(self.Nbins)))
            self.bins[k].append(n)

    def prt(self):
        for k in self.bins:
            print self.bins[k]

b = bins(0,100,10)
b.AddToMap(1)
b.AddToMap(13)
b.AddToMap(21)
b.AddToMap(14)
b.AddToMap(13)
b.AddToMap(9)
b.AddToMap(11)
b.AddToMap(10)
b.AddToMap(0)
b.AddToMap(100)
b.AddToMap(42)

b.prt()

yielding

Object out of map range. [ 100 ]
[1, 9, 0]
[13, 14, 13, 11, 10]
[21]
[]
[42]
[]
[]
[]
[]
[] 
kpie
  • 9,588
  • 5
  • 28
  • 50