1

I was given a ragtag assortment of data to analyze and am running into a predicament. I've got a ~2 million row table with a non-unique identifier of datatype varchar(50). This identifier is unique to a personID. Until I figure out exactly how I need to normalize this junk I've got another question that might help me right now: If I change the datatype to a varchar(25) for instance, will that help queries run faster when they're joined on a non-PK field? All of the characters in the string are integers, but trying to convert them to an int would cause overflow. Or could I possibly somehow index the column for the time being to get some of the queries to run faster?

EDIT: The personID will be a foreign key to another table with demographic information about a person.

wootscootinboogie
  • 8,461
  • 33
  • 112
  • 197

2 Answers2

2

Technically, the length of a varchar specifies it's maximum length.

The actual length is variable (thus the name) so a lower maximum value won't change the evaluation because it will be made on the actual string.

For more information :

Check this MSDN article and this Stack overflow Post

Community
  • 1
  • 1
Yan Brunet
  • 4,727
  • 2
  • 25
  • 35
0

Varchar(50) to varchar(25) would certainly reduce the size of record in that table, thereby reducing the number of database pages that contain the table, improving the perfomance of queries (may be to a marginal extent), but such an ALTER TABLE statement might take a long time.

Alternatively, if you define index on the join columns, and if your retrieval list is small, you can include those columns also in the index definition (Covering index), that too would bring down the query execution times significantly.

Vikdor
  • 23,934
  • 10
  • 61
  • 84