3

I have a large SQL Server database with about 40 columns and hundreds of millions of rows.

This table is supposed to be loose in schema so I have a lot of columns as VARCHAR(MAX) even where it could have been BIGINT, DATETIME, INT etc. Does this have an impact on querying time/efficiency? e.g. will

SELECT TOP 100 * FROM CustomerId = 34343

be faster than

SELECT TOP 100 * FROM CustomerId = '34343'

? If yes, how much faster?

And what if I use VARCHAR(MAX) instead of fixed length VARCHAR.. And what about other DBs like mySQL etc. in this regard?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Hari Menon
  • 33,649
  • 14
  • 85
  • 108
  • 2
    I shudder at the idea that a table should be loose in schema. That is a mistake of mammoth proportions. Flexibility - performance - pick one. I guarantee your users want performance more than a flexible schema. Save flexibility for rare cases not a major part of your system design. Bad enough to use varchar for numbers which means you will have to convert them back to number to do any reporting calculations (and you will have bad data integrity as pretty much a guarantee) but using a varchar for dates is even worse. Unless you like trying to interpret how to handle 02/30/2010. – HLGEM Aug 09 '10 at 14:57
  • 1
    You don't say you are using a n EAV table, but this article may point out to you why loose schemas are a bad design - you wilhave to get data back out sometime and query this mess: http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/ – HLGEM Aug 09 '10 at 14:59
  • actually we get garbage data at times and we need to log even that. So we might get customerID = 568d4 which wont get logged if I have customerId as numeric. we are trying to figure out something though – Hari Menon Aug 10 '10 at 06:20

3 Answers3

3

Yes, comparing strings is usually slower than comparing pure numbers. Whether it is measurable depends on how the query execution engine does the comparison. If the query engine does not compare to the end of the strings - which it often won't, then your penalty is not great. Try it and see. But in theory, you'd be better off with the numeric comparison for numeric quantities.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • ok... I was hoping for some quantitative report on the effect if somebody has some source for that.. thanks!~ – Hari Menon Aug 09 '10 at 09:02
2

Yes, there's definitely a performance benefit from using INT vs. VARCHAR(MAX) for comparisons. How much is really hard to say without actually measuring.

Also - there's no reason not to use VARCHAR(MAX) - but only when it's needed and when it makes sense!

See:

for some good reasons why you shouldn't just make everything VARCHAR(MAX) - just because you could....

Community
  • 1
  • 1
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
1

And what if I use VARCHAR(MAX) instead of fixed length VARCHAR.. And what about other DBs like mySQL etc. in this regard?

  • PostgreSQL treats VARCHAR(n) as if it were TEXT CHECK(LENGTH(Column) <= n). There's no performance advantage to specifying a maximum length.
  • SQLite completely ignores length limits on VARCHAR columns.
  • MS SQL Server, however, does not allow creating indexes on VARCHAR(MAX) columns, which decreases performance.
dan04
  • 87,747
  • 23
  • 163
  • 198