Addresses are complicated. Very complicated. They are highly irregular and subjective things. And logistics companies have spent billions over the course of decades trying to make sense of them.
Better to capitalize on what others have done than to try to re-invent it.
The data you have is actually pretty meaningful. It just doesn't "feel" very meaningful. Businesses like to have their address data split up into lots of little pieces, but why? What do all of those little pieces mean? Why do they need to be distinct from one another? The data you have is an "address". Keep it, but add to it. Make use of existing information to extrapolate more information.
Use a geocoding API (Google? Bing? Some other service? Prices, etc. will vary) to search on the data you have and bring back more strongly-typed data. Store that alongside what you have. For example, you have this:
12003 Main St New York NY 00991
So you make a request here:
http://maps.googleapis.com/maps/api/geocode/json?address=12003+Main+St+New+York+NY+00991&sensor=false
And you get back this:
{
"results" : [
{
"address_components" : [
{
"long_name" : "D R Main Street",
"short_name" : "D R Main Street",
"types" : [ "point_of_interest", "establishment" ]
},
{
"long_name" : "5",
"short_name" : "5",
"types" : [ "street_number" ]
},
{
"long_name" : "West 31st Street",
"short_name" : "W 31st St",
"types" : [ "route" ]
},
{
"long_name" : "Midtown",
"short_name" : "Midtown",
"types" : [ "neighborhood", "political" ]
},
{
"long_name" : "Manhattan",
"short_name" : "Manhattan",
"types" : [ "sublocality", "political" ]
},
{
"long_name" : "New York",
"short_name" : "New York",
"types" : [ "locality", "political" ]
},
{
"long_name" : "New York",
"short_name" : "New York",
"types" : [ "administrative_area_level_2", "political" ]
},
{
"long_name" : "New York",
"short_name" : "NY",
"types" : [ "administrative_area_level_1", "political" ]
},
{
"long_name" : "United States",
"short_name" : "US",
"types" : [ "country", "political" ]
},
{
"long_name" : "10001",
"short_name" : "10001",
"types" : [ "postal_code" ]
},
{
"long_name" : "4414",
"short_name" : "4414",
"types" : []
}
],
"formatted_address" : "D R Main Street, 5 West 31st Street, New York, NY 10001, USA",
"geometry" : {
"location" : {
"lat" : 40.7468529,
"lng" : -73.9865046
},
"location_type" : "APPROXIMATE",
"viewport" : {
"northeast" : {
"lat" : 40.7482018802915,
"lng" : -73.98515561970851
},
"southwest" : {
"lat" : 40.7455039197085,
"lng" : -73.98785358029151
}
}
},
"partial_match" : true,
"types" : [ "point_of_interest", "establishment" ]
}
],
"status" : "OK"
}
Now that is some meaningful-looking data. Maybe not the "units" of data that somebody in your company thought addresses were made of, but meaningful and useful. For any given address in your data, you can automate this.
Let users enter their address the way they know it. Store that subjective address as the user-entered version. Geocode it to get more structured data to store alongside it.