I was given a ragtag assortment of data to analyze and am running into a predicament. I've got a ~2 million row table with a non-unique identifier of datatype varchar(50)
. This identifier is unique to a personID
. Until I figure out exactly how I need to normalize this junk I've got another question that might help me right now: If I change the datatype to a varchar(25)
for instance, will that help queries run faster when they're joined on a non-PK field? All of the characters in the string are integers, but trying to convert them to an int would cause overflow. Or could I possibly somehow index the column for the time being to get some of the queries to run faster?
EDIT: The personID
will be a foreign key to another table with demographic information about a person.