You have the right idea. Don't repeat the same city multiple times. It will slow down your queries in part because the city names will take up much more space in memory than integers. Larger tables take more time to process, and with almost a billion and a half rows, it should make a significant difference. See this posting.
Repeating the same name thousands of times also makes it a lot more likely that you'll introduce variations (typos) that make your query results unreliable. Finally, this design will make it much tougher to make changes to a city name.
Replace the plain city with a unique city_id
for each city. In another city
table, you'd have that same city_id
and a plain name
that holds the full city's name.
To profit from the change, you must let MySQL know of this relationship. When you create the tables, be sure to make city_id
the primary key of the city table, and in your big table make city_id
a foreign key.
This change will also allow you to expand your database more easily if you want to start tracking more info on a city. To learn more, look into database normalization.