4

I have an array of objects with time and value property. Looks something like this.

UPDATE: dataset with epoch times rather than time strings

[{datetime:1383661634, value: 43},{datetime:1383661856, value: 40}, {datetime:1383662133, value: 23}, {datetime:1383662944, value: 23}]

The array is far larger than this. Possibly a 6 digit length. I intend to build a graph to represent this array. Due to obvious reasons, I cannot use every bit of the data to build this graph (value vs time); so I need to normalize it across time.

So here's the main problem - There is no trend in the timestamp for these objects; so I need to dynamically choose slots of time in which I either average out the values or show counts of objects in that slot.

How can I calculate slots that user friendly. i.e per minute, hour, day, eight hours or so. I am looking at having a maximum of 25 slots done out of the array, which I show up on the graph.

I hope this helps get my point through.

Omkar Khair
  • 1,364
  • 2
  • 17
  • 40
  • is this question about saving memory (bytes) or a question about displaying the values in a user-friendly manner? – sled Nov 05 '13 at 14:40
  • @HighPerformanceMark my bad. I have manually coughed up a demo data set. – Omkar Khair Nov 05 '13 at 14:45
  • @sled this is about displaying values in a user-friendly manner. To show 100,000 values on a graph I need to compress the time scale. I need some direction in compressing this time scale in a user friendly manner. – Omkar Khair Nov 05 '13 at 14:47
  • mhhh this is just a quick thougt (no guarantee it will work): build a binary/sorted tree first and then start from the leaves, merging the leaves into buckets containing multiple items until you reach 25 buckets at the leaf level. These buckets then represent points on your time line which may contain multiple items. Very similar to a BTree. – sled Nov 05 '13 at 14:56
  • @sled seems close, but I need buckets across the time scale, which is pretty much sorted. What you propose would be a lot helpful to create buckets of values. Am I right? – Omkar Khair Nov 05 '13 at 15:01
  • what about using intervals as values as an example: the first branch spreading out from the root would be `(max_date - min_date)/2` and then you split the items up into these two intervals and divide and branch again, when you're finished you statrt to merge up... – sled Nov 05 '13 at 15:03
  • 1
    Or another idea, you could try the k-nearest neighbors algorithm to build clusters along the timeline. – sled Nov 05 '13 at 15:05
  • Are you looking for a function to map a datetime to a bucket number? – mbeckish Nov 05 '13 at 16:25

3 Answers3

1

You can convert the date/time into epoch and use numpy.histogram to get the ranges:

import random, numpy
l = [ random.randint(0, 1000) for x in range(1000) ]
num_items_bins, bin_ranges = numpy.histogram(l, 25) 
print num_items_bins
print bin_ranges

Gives:

[34 38 42 41 43 50 34 29 37 46 31 47 43 29 30 42 38 52 42 44 42 42 51 34 39]
[    1.      40.96    80.92   120.88   160.84   200.8    240.76   280.72
   320.68   360.64   400.6    440.56   480.52   520.48   560.44   600.4
   640.36   680.32   720.28   760.24   800.2    840.16   880.12   920.08
   960.04  1000.  ]
perreal
  • 94,503
  • 21
  • 155
  • 181
  • The issue isn't with datetime conversion. I have the resultant epoch time with me. It is more about compressing 100,000 values on a time scale, to make it more user friendly. Maybe I'll update the question with epoch times in the data. – Omkar Khair Nov 05 '13 at 14:48
  • I'm not sure. How do I use this on my data. I need to compress on the time scale. Basically to show results in slots/buckets of minutes/days/hours etc. – Omkar Khair Nov 05 '13 at 15:21
  • apply the histogram on the epochs, later you can convert them to days, minutes... – perreal Nov 05 '13 at 15:41
1

Hard to say without knowing the nature of your values, compressing values for display is a matter of what you can afford to discard and what you can't. Some ideas though:

  1. histogram
  2. candlestick chart
Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
0

Is this JSON and the DateTimes transmitted as text?

Why not transmit the Date as a long (Int64), and use a method to convert to/from DateTime? Depending on which language you could use these implementations:

That alone would save you a considerable amount of space, since strings are 16-bits per character and the long TimeStamp would be just 64 bits.

Community
  • 1
  • 1
Dr. Andrew Burnett-Thompson
  • 20,980
  • 8
  • 88
  • 178
  • The issue isn't with datetime conversion. I have the resultant epoch time with me. It is more about compressing 100,000 values on a time scale, to make it more user friendly. – Omkar Khair Nov 05 '13 at 14:47