3

Situation:

varchar(20) seems to truncate silently in Teradata and not to expand or complain when encountering strings larger than 20 characters long... This is a bit of a surprise as I expected either automatic expansion of the column to fit larger strings, say 30 characters, OR for an error to be thrown if a larger string were encountered. Silent truncation seems to get me the worst of all worlds...

Complication:

For my application (prototype analytics design) I don't know in advance how large will be the data I will be ingesting over the course of a few weeks. That seems to rule out using varchar(N), except for max

Questions:

So now I have a few choices, and am looking for some guidance:

Q1. User error? Am I misunderstanding a key concept about varchar(N)?

If this is in fact how Teradata handles varchar fields, then

Q2. why would anyone specify anything less than varchar(max) especially when it is not clear in advance how many characters might need to be stored in the field.

Q3. Is there a different data type that permits flexible sizing of the string -- i.e. a true variable length character string?

If I recall, other SQL dialects implement varchar(n) as a recommended initial size for the string but allow it to expand as needed to fit the maximum length of the data strings thrown in. Is there a similar data type in Teradata?

(Note: since I'm prototyping the tables, I am less concerned about performance efficiency at this point; more concerned about quick but safe designs that allow the prototype to progress.)

Assad Ebrahim
  • 6,234
  • 8
  • 42
  • 68

2 Answers2

2

I am not familiar with any dialect of SQL that implements a varchar(n) that behaves as you suggest -- a recommended initial size and then letting it grow. This would apply to Oracle, SQL Server, MySQL, and Postgres. In all these databases, varchar(n) behaves pretty much as you see it behave in Teradata in SELECT statements with explicit casts. I don't believe any cause a truncation error when a longer string is placed into a shorter string.

As Branko notes in his comment, the behavior is different in data modification steps, where an implicit cast does cause an error.

I am not familiar with all the details of Teradata. In SQL Server, there is historically a world of difference between varchar(max) and varchar(8000). The former would be allocated on a separate data page, with the latter allocated on the same page as the data. (The rules have been modified in more recent versions so varchars can spill off the data page.)

In other words, there may be other considerations when using varchar(max), involving how the data is stored on pages, how indexes are built on them, and perhaps other considerations.

My suggestion is that you pick a reasonably large size, say 1000 or so, and let the application continue from there. If you want real flexibility, then use varchar(max). You should also investigate through Teradata documentation and/or technical contacts what the issues are with declaring very large strings.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • In my application, it's the difference between say varchar(20) and varchar(255). None of the fields are free text. They're all logically limited text strings, something like names, states, etc. Sounds like in that case I'm better off just putting in 255 and leaving it at that. – Assad Ebrahim Aug 23 '12 at 04:36
  • Given the silent truncation behaviour, perhaps pre-testing the max length of strings in the table prior to entering into the table. Perhaps as part of the data integrity / validation routines. – Assad Ebrahim Aug 23 '12 at 04:38
  • How about varchar() with no N? Does that just use a default value of N? – Assad Ebrahim Aug 23 '12 at 04:40
  • 2
    _"I don't believe any cause a truncation error when a longer string is placed into a shorter string."_ Actually, truncation **is** reported as error in must DBMSes I know. You can try this [SQL Fiddle](http://sqlfiddle.com/#!3/3a51f/1) and cycle through the available DBMSes - you'll see that the MS SQL Server, MySQL, Oracle and PostgreSQL all issue an error when attempting to INSERT a string that is too long. The only exception is SQLite which doesn't seem to enforce the VARCHAR length at all. – Branko Dimitrijevic Aug 23 '12 at 09:46
  • 1
    @BrankoDimitrijevic . . . Interesting. I tested something like "select cast('abcde' as varchar(2))", which didn't report an error. I guess the answer to the question depends on whether the truncation is happening in a data modification step or a select. – Gordon Linoff Aug 23 '12 at 13:01
  • @BrankoDimitrijevic - Thanks for the link to SQL Fiddle - that's a great exploration platform. (+1) (Too bad they don't have Teradata as a supported DBMS.) – Assad Ebrahim Aug 24 '12 at 06:24
  • @GordonLinoff: +1 for the `cast as varchar` scenario in which truncation **is** silent -- that's what's happening in my query. – Assad Ebrahim Aug 24 '12 at 06:40
  • @GordonLinoff: did you mean to say (in your edit) *"where an implicit case does NOT cause an error"?* – Assad Ebrahim Aug 24 '12 at 06:45
  • Accepting this as the correct answer since it covers most of the questions. Found a few helpful elaborations in this question about [the significance of the length of `N` used in `varchar(N)`][1] and the answer that there is no significant performance difference between `varchar(50)` and `varchar(200)`. http://stackoverflow.com/q/1962310/181638 – Assad Ebrahim Aug 24 '12 at 06:53
  • Another helpful answer: this one discusses design alternatives for when `N` is large, and considers impact in various scenarios. http://stackoverflow.com/a/4204757/181638 – Assad Ebrahim Aug 24 '12 at 06:58
2

Teradata works in two modes : Teradata (BT; .. ET;) and ANSI(commit;). They have list of differences and one of them you've met during development -- Teradata mode allows truncation of display data. On contrary - ANSI forbids such truncation, so, you'll see an error. To get the idea, just use simple example: create table check_exec_mode (str varchar(5)) ; select * from check_exec_mode ; insert into check_exec_mode values ('123456') ; If you configure connections of your teradata client (e.g., Teradata Studio Express) in TMODE(transaction mode)=TERA, then you'll get as a result one truncated row in the table ('12345'). Changing transaction mode to ANSI and executing insert statement , will lead you to the error "Right truncation of string data".

Dima
  • 41
  • 3