A general database design question:
Suppose I'm putting together a fairly large database - say tens of thousands of rows, not many columns - and I want one of those columns to be an identifying 'tag' that would most obviously be a long-ish string of 30-50 characters.
And the number of distinct tags would be fairly limited.
Every introductory thing I've read about SQL and database design suggests that you should be careful about choosing the right data types for your info - don't store info as a float if you can get by storing it as an integer, don't store info as text if you can store it as a number - to keep your database from getting unnecessarily large. But I'm not sure how much difference that makes to performance.
In my particular case, it seems like it would be easy enough to create a second table with all of my English 'tags' paired with some arbitrary integer 'code'. Then, I could store only the integer codes in my main table, making it smaller. But, then, for a user to access the data they want from the main table - without knowing the integer codes and only knowing the English tags - every query would (I think) have to look something like:
"SELECT * FROM maintable WHERE code = (SELECT code FROM secondtable WHERE tag = 'English tag')"
That seems like it would work, but then every query would effectively be two queries.
So, is there likely to be a meaningful performance benefit from making the main table smaller, by replacing the English 'tags' with integer 'codes'? And, if so, is that performance improvement likely to outweigh the performance cost of essentially doubling the number of queries?
Is there a rule of thumb on this issue? A generally preferred way of proceeding? If so, I would love to know before getting too far in.