1

I have a Database with SQL Server: it stores measurements from some sensors, the table looks like that:

|MEASUREID|SensorID*|Temperature|Humidity|dateandTime|

MEASUREID is the PK (it´s an integer created automatically from SQL Server. +1 for each new record) and SensorID the FK. The table cointans a huge amount of data and is to be updated relatively often: update means new measurement have to be INSERTED into the table.

It could happen that the measures we want to save, already exist in the DB, and we don't want any duplicate to be stored.

Possible solutions I figured out:

  • checking every time if a measure already exists will kill the performances
  • I read about indexes. They should be pretty efficient for big tables but at the same time it´s not advisable to use them when Insert/update operations happen very often.
  • I thought to use a combination of HASH functions and indexes:the MeasureID can be the result of a so called hash function MID = f(sensID,temp,humid,dateandtime) . MID is unique and could be indexed: if a new measure (temperature, humidity etc.) has to be inserted, you can retrieve the hash function to produce the corresponding MID and easily check if that MID already exists.

What do you think about it? Do you have any other idea?? Thanks :)

gofr1
  • 15,741
  • 11
  • 42
  • 52
rugrag
  • 163
  • 1
  • 1
  • 7
  • Possible duplicate of [Equivalent of MySQL ON DUPLICATE KEY UPDATE in Sql Server](http://stackoverflow.com/questions/27076348/equivalent-of-mysql-on-duplicate-key-update-in-sql-server) – Eray Balkanli Mar 11 '16 at 13:41
  • checking every time if a measure already exists will kill the performances ..why wont hashing kill – TheGameiswar Mar 11 '16 at 13:43
  • because you have O(1) instead of O(logn) – rugrag Mar 11 '16 at 13:47
  • I think constraints also will follow the same logic,but i am not sure – TheGameiswar Mar 11 '16 at 13:48
  • We need to know a bit more about what "relatively often" really means, and how much data you actually have. Certainly the simplest solution would be based on a simple unique index, and it's reasonably likely, in my experience, that the performance hit would be acceptable. Presumably you could try this on your test system and measure it? – Matt Gibson Mar 11 '16 at 14:06
  • approximately 7-10 sensors. they make a measurement every hour every day. the goal is to make statistics and about how these values change during with time passing. it will be quite soon a very HUGE amount of data. now it still works well even if you check every single record but it won´t be in the future – rugrag Mar 11 '16 at 14:16
  • Ten sensors, a measurement every hour -- that wouldn't even hit ten million records if you recorded for a hundred years. That's no data at all. – Matt Gibson Mar 11 '16 at 15:28
  • what does it mean? – rugrag Mar 11 '16 at 15:39
  • 1
    I mean that adding an index to a five-column table where you're only adding ten rows an hour is unlikely to have any significant performance impacts. You're unlikely to even be able to *measure* a performance difference before/after an index, let alone have it cause you a problem. – Matt Gibson Mar 11 '16 at 17:25

0 Answers0