2

My code captures numeric values from web services. The captured values are generally increasing, but sometimes they decrease. For now, I just do a dumb capture of what I have observed at a specific time... this is what my table structure looks like now:

id   date_collected      value1_observed  value2_observed
1    2012-01-26 1:00am   500              12
2    2012-01-26 1:01am   90000            NULL
3    2012-01-26 1:02am   100              50
1    2012-01-26 7:23am   502              12
2    2012-01-26 7:24am   90105            NULL
3    2012-01-26 8:09am   97               55

You can see that...

  • ID=1 value1 increased by 2 in 6 hours 23 minutes
  • ID=2 value1 increased by 105 in 6 hours 23 minutes
  • ID=3 value1 decreased by 3 in 7 hours 8 minutes

Difficulties:

  • The collection time is inconsistent (I collect multiple times per day with a varying length of time between collections)
  • the values can increase or decrease or be null

I have captured tens of millions of rows of these records and I'd like to be able to query them more easily, so I'm thinking I need to transform this data into a schema that is better suited for the questions I want to answer. I'd like to be able to find patterns like:

  • "Between the hours of 1am and 5am, these IDs had the largest changes in value1".
  • "Over the last 30 days, which IDs have had value2 grow the most (in absolute and relative terms)?"

Are there any suggestions on how I should structure the schema? Would it be possible to transform this data to a traditional star schema?

UPDATE to answer questions:

  • I am currently using Microsoft SQL Server, but am open to using MySQL or other open source options.
  • If id 1 changes from 50 to 100 to 52, I would like to know that it went +50 and then -48. That way I could see fluctuation and net gain/loss over time. For example, if I were tracking registered users on a site... generally that numbers always goes up, but if people delete their accounts, I would want to identify that trend. However, if my app didn't observe the change from 50 to 100 and only recorded 50 and 52, it would not know that the number ever hit 100.

Thanks for any insight/direction,

-Paul

PaulMest
  • 12,925
  • 7
  • 53
  • 50
  • 1
    Please specify also the database you're using – Lukas Eder Jan 27 '12 at 09:16
  • 1
    With respect to "the largest changes in value 1", if value 1 changes from 50 to 100 and then back to 52 during your interval, is that a change of 2 or a change of 98? – Mike Sherrill 'Cat Recall' Jan 27 '12 at 12:21
  • I have updated the posted to answer Lukas Eder's and Catcall's questions. – PaulMest Jan 27 '12 at 21:59
  • Is there any relationship between value1 and value2? How come the id values keep repeating, what do they represent sampling devices or sampling operations? – Stephen Senkomago Musoke Mar 01 '12 at 17:52
  • @ssmusoke value1_observed and value2_observed are point in time values affiliated with the ID. Think of ID representing a city and then value1_observed could be the temperature and value2_observed could be the barometric pressure. – PaulMest Mar 06 '12 at 02:15

1 Answers1

0

It seems like your options may be to go for a column based data store for your data see this thread What is the best way of storing trend data?

These are MySQL compatible engines, so I suspect that the changes to your infrastructure may not be major.

Community
  • 1
  • 1
Stephen Senkomago Musoke
  • 3,528
  • 2
  • 29
  • 27