0

I'm designing a table. One field apparently can have either numeric input or text input (e.g., 700 or 789.0123, or Trump or Clinton), depending on the subject matter of the application.

Would it be better to store the numeric inputs as varchar text (e.g., "700.0000" or "789.0123" formatted to 4 decimal places (or some other common number of decimal places) in the same field as the obvious text input (Trump or Clinton), or would it be better to have two fields and have the several applications (and applications programmers) direct their data to the appropriate field?

rsenn
  • 9
  • 2
    "Better" is entirely arbitrary here. There are situations where either form would be correct. My *bias* is toward two columns, but it is impossible to say without understanding the data and the application. – Gordon Linoff Jun 07 '16 at 00:26
  • What does the input format and/or mode have to do with the conceptual value? Find a design for the user's state of the system first. Then define input functionality in terms of it. Maybe the numbers and names should even be in different tables. – philipxy Jun 07 '16 at 08:55

3 Answers3

1

What is the meaning or definition of the field, and what will you do with it?

A field that can be either "Trump" or "789" raises alarm bells for me as sounding very much like you are using a single field to hold totally different data.

If the field is something like, "identifier of the uploaded document", and sometimes the user gives a text description, and sometimes he gives the Dewey Decimal code for a book, then it's fine.

But if the definition is something like, "when the record type is 1, this is a State record and this field is the name of the candidate who won the primary, but when the record type is 4 than this is a Precinct record and this field is the number of registered voters in that precinct", I can only say ... don't. Make two separate fields.

What advantage is there to combining two complete different fields into one? Save having to add a field to your table? It's going to cause all sorts of difficulty. If you ever need to sort by the numeric value or do a range check, it's not going to work, because of course the number 780 > 82 but the string '780' < '82', etc. You can't do arithmetic on a varchar field: '780' + 23 is either an error or '78023', depending on the flavor of SQL. And for human beings trying to make sense of your code ... what do you even call this field?

I don't know your application. It's possible that it makes sense in context. Perhaps if you told us what the field represents it would make sense. It certainly looks suspicious.

yassadi
  • 524
  • 1
  • 9
  • 20
Jay
  • 26,876
  • 10
  • 61
  • 112
0

The answer depends on your desired level of Normalization. If you desire 1NF then you need to design attributes that are atomic. In other words, if you never plan to use any subdivision of the attribute, then by all means combine them together. If you don't need to validate, or sort or query by any subset of the attribute then you can combine the two seemingly disparate data items.

nicomp
  • 4,344
  • 4
  • 27
  • 60
  • What does "subset of the attribute" mean? (Maybe you mean part/subdivision of the attribute?) What do normalization, "atomicity" or subdivision have to do with the question? (Maybe you are modeling the values as number-string pairs?) – philipxy Jun 07 '16 at 09:03
  • @philipxy You're pretty much right. "Atomicity" has everything to do with the question. Obviously the OP can put anything he wants into his attributes: I am explaining why he might not want to combine two data items that he will probably end up breaking apart when he implements his application. If he doesn't care about Normalization, then he can disregard what I wrote. Not all of us care about Normalization: perhaps he is designing a data warehouse. – nicomp Jun 07 '16 at 09:34
  • I am confused by "You're pretty much right." (Maybe, about my guesses of what you meant?) The straightforward design for the question is an attribute of string type or type integer U string or types integer U {NULL} and string U {NULL}, which one might want to transform to another design, but those transformation don't involve normalization, atomicity or subdivision. So I would say that the only normalization to 1NF that you are doing is to undo your unnecessary complication of introducing a pair type attribute. (But it's not clear what "normalization" you mean would be going on.) – philipxy Jun 07 '16 at 10:56
  • You are pretty much right about your guesses of what I meant. When I refer to Normalization I am writing about the work of Dr. E. F. Codd, who is credited with defining the levels of Relational Database Normalization: 1NF, 2NF, 3NF, etc. If the OP plans to differentiate the the attribute based on some subset of the contents then by definition the attribute is no longer atomic and that is, by definition, a violation of 1NF. – nicomp Jun 07 '16 at 11:37
  • Thanks. My comments are informed by a good knowledge of [the various senses in which "normalized" & related terms have been used](http://stackoverflow.com/a/24038895/3404097). Hence the only normalizing (in any sense) that you might be doing is *un*doing an unnecessary complication of introducing something like a (number U {null}) X (name U {null}) attribute. The question has straightforward designs not requiring any normalizing. (Including one attribute for numbers & one for names, without nulls.) Some use types that are *unions* of types. But not *products* of types with value "subdivision". – philipxy Jun 07 '16 at 12:57
0

In a well formed database, every field's contents are an instance of an attribute, and an attribute will have a specific domain, the set of values that are possible. It's very unusual to have a domain that spans multiple datatypes. It's more likely that you are trying to store multiple attributes in a single column.

This is nearly always bad design. You are better off to start with a conceptual model of the data, one which will clarify both the form and the semantics of every attribute. Then design columns that match, and tables that contain appropriate columns.

Normalize your tables unless you have a real good reason not to. (this does happen in real life, but if you are a newbie, don't start out that way).

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58