31

I'm a little confused about data consistency and data integrity. From Oracle Database Concepts:

data integrity
--------------
Business rules that dictate the standards for acceptable data. These rules
are applied to a database by using integrity constraints and triggers to
prevent invalid data entry.

From Wikipedia

Consistency
-----------
Consistency states that only valid data will be written to the database.

So what's the difference between data consistency and data integrity?

Thanks in advance.

Just a learner
  • 26,690
  • 50
  • 155
  • 234

5 Answers5

36

They are not only different, they are orthogonal.

Inconsistency:
A DB that reported employee Joe Shmoe's department as Sales but that didn't list Joe Shmoe among the employees in the Sales department would be inconsistent.
It's a logical property of the DB, independent of the actual data.

Integrity:
A DB that reported jOe SaleS to be a member of the Shmoe department would lack integrity.
jOe SaleS isn't a valid employee name and Shmoe isn't a valid department.
That's not logically invalid, but it is invalid relative to the rules that govern data content.

Jim Balter
  • 16,163
  • 3
  • 43
  • 66
  • I see that "primary key" integrity constraint is intended to eliminate the consistency issue that you have demonstrated. It says that employee has a foreign key that refers a department then the department must have the employee listed. It is like [syntax may somehow enforce the semantic integrity](http://cs.stackexchange.com/questions/51189/ambiguity-vs-context-sensitivity). – Valentin Tihomirov Mar 10 '16 at 07:58
  • I mean that if one enhances the other then we cannot say that they are absolutely orthogonal and don't correlate. I mean that that the claim of orthogonality and the issue of the difference between two needs further clarification. – Valentin Tihomirov Mar 10 '16 at 08:06
  • Driveby downvote by some sociopath on this date. As for the comments above, I never said anything about "absolutely" or that they cannot "correlate" ... of course there are instances of solutions that address both problems; that's a given for orthogonality. – Jim Balter Sep 19 '17 at 07:14
  • 1
    I also question the use of the term "orthogonal". All the usual definitions of the word involve some notion of independence or irrelevance of one subject to the other, but we don't really have that here. Specifically, it's hard to imagine how one can have data integrity without having data consistency. Because there's a dependence, the concepts are not orthogonal. – Ted Hopp Jan 30 '18 at 20:24
  • Good luck with business rules that guarantee the consistency of a DB that contains both a list of distances between cities and a "shortest path" field -- that's NP-Hard. Or a DB that contains a list of programs and a boolean "terminates" field for each one -- that consistency is undecidable. – Jim Balter Jan 31 '18 at 01:25
  • Sorry, but that's nonsense. Data consistency doesn't require solving NP-hard or undecidable problems. In each of the cases you suggest, it's easy enough to include an "undetermined" or "unknown" value in the data model for those fields. Data consistency would require, for instance, that the data base invalidate the "shortest path" field if the distance between two cities changes (if, say, a surveying error was corrected). Put it another way: good luck with ensuring data integrity if fields in the data base can be updated in inconsistent ways. – Ted Hopp Feb 01 '18 at 04:09
  • That comment doesn't address what I wrote. – Jim Balter Feb 01 '18 at 07:49
  • In database terminology, data consistency is not synonymous with data correctness. What you've described (e.g., whether the shortest path stored in the data base is actually consistent with the inter-city distances also stored in the data base) is not data consistency; it's data integrity. The very [definition of data integrity](https://en.wikipedia.org/wiki/Data_integrity) emphasizes the dependency: "the maintenance of, and the assurance of the **accuracy and consistency of**, data over its entire life-cycle." (emphasis added). You can't have integrity without consistency. – Ted Hopp Feb 01 '18 at 16:43
  • Just to further elaborate: the [definition of consistency in database systems](https://en.wikipedia.org/wiki/Consistency_%28database_systems%29) doesn't require that the data be consistent from a business perspective, just that transactions do not violate any database constraints, cascades, or triggers. In particular, "[Consistency] does not guarantee correctness of the transaction in all ways the application programmer might have wanted". – Ted Hopp Feb 01 '18 at 16:46
10

Very roughly, my 2c:

Integrity = strong types, no illegal values as determined by the data model & constraints, foriegn keys, unique constraints and stuff like that.

Consistency = being able to read only committed data a given point in time, not the intermediate steps.

lll
  • 297
  • 1
  • 3
8

Data can be entirely consistent and entirely wrong. Data integrity is more about the quality of data and goes well beyond data base management systems. DBMSs provide data consistency tools that can help with data integrity; they are one piece of the larger problem. Integrity constraints and triggers help ensure that a DBMS does not degrade (or, if you're a pessimist, further degrade) the integrity of the data that arrives.

The way to parse the Oracle verbiage is that integrity constraints and triggers are a way to implement a subset of the business rules that define data integrity.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
2

From here

Data consistency

Execution of transaction in isolation preserves the consistency of the data. The data value stored in the database must satisfy certain consistency constraints. For example the balance on an account should not fall below $100. The DDL commands provide the facilities to specify such constraints. The database systems check these constraints every- time the database is updated. In general it can be an arbitrary predicate pertaining to the database. To check the consistency of database most of programmer depends on the constraints and these are usually costly to test.

Data integrity

The integrity of data refers to as "whole data at one place". The data is in the form of tables and each table has some attributes. These are the cases where we wish to ensure that a value that appears in one relation for given set of attributes should also appear for a certain set of attributes in another relation (referential integrity). Database modification can cause violation of integrity. When the integrity constraint is violated, the normal procedure rejects the action that has actually caused the violation.

enter image description here

You can read first chapter for ACID.

berkay
  • 3,907
  • 4
  • 36
  • 51
1
  1. Consistency: You don’t have dates, number and letters in same column but only one of them. This is forsed by defining column type, eg. column id number (5) means there won’t be dates nor letters in it and numbers greater then 99999
  2. Integrity: If you have emplooyes table and departments table and department_id column in both of them (with meaning of department_id in witch employee is working) there won’t be any employee working in department that doesen’t exists in departments table. This is forsed by referencal integrity constraint.