I also work at SmartyStreets, and since I'm not a developer I'm not bound by any constraints such as "it can't be done" or "there's no way to do it reliably". In fact the ideas that I come up with may not even always be possible, but, I'm a problem-solver, a solution-finder, and this particular problem absolutely has a solution.
You'll need the following: a little regex, knowledge of a scripting language (python, php, whatever you prefer) and access to an address validation tool (this is required so that you know when you get it right).
So, let's start with the example sentence:
Meet me at 1234 Apple Street New York, NY 10011 See you there!
We can be sure that every address has a beginning and an end. (you can take that to the bank!)
So, if you run a regular expression that looks for the beginning of the address within the string you can eliminate everything before the address begins. Here's a regex that will do just that:
(^(.*(?=p\.?o\.? box|h\.?c\.?r\.? |c\.?m\.?r\.?)|^[^0-9]+))
This will give you back the following:
1234 Apple Street New York, NY 10011 See you there!
Now, you're halfway there but you'll need to loop through the remaining string. Another assumption that you can certainly make is that an address will never be longer than 328 charachters long (I made up that number, but you get the picture. An address has to have an end as well and you can shorten the string by determining the max acceptable USPS address length.)
You're going to loop through the address string until you get a valid address out of it. To do this, start at the beginning and move one word to the right with each additional permutation. This is where the address validation service come in handy, because you have no idea where the address ends and that's what you need to know. So, each permutation you generate from the string (remember, you're starting from the left side) will be sent for validation. Since no valid address can have fewer than two words, You'll start there. Here are the permutations from the example address as well as the validation results (I'm trying each address by entering it in the address line of the address search box on smartystreets.com:
1234 Apple ==> fail
1234 Apple Street ==> fail
1234 Apple Street New ==> fail
1234 Apple Street New York ==> fail
1234 Apple Street New York, NY ==> Bingo, valid address match. No need to keep iterating.
Now, obviously this is not a valid address but you can try the same thing with a real address and you'll get the same results. Obviously this isn't the most sophisticated method to extract a valid address from a string but it certainly works. And, since SmartyStreets allows you to send up to 100 addresses per query, you could permute the address string up to 99 times and get the results back in under 300ms. This won't work with every address, as you'll certainly find out, but it can very easily handle a large majority of them, regardless of how obscured the address is within the text string.
So, we started with this meet me at 1234 Apple Street New York, NY 10011 See you there! and within less than half a second came up with this 1234 Apple Street New York, NY 10011-1000.
Pretty cool huh? It even sounds really easy coming from a non-programmer.
Let's try it with a real address:
Meet me at 4219 jon young orlando fl 32839 See you there!
Apply regex and you get:
4219 jon young orlando fl 32839 See you there!
Permute, iterate, validate:
4219 jon ==> fail
4219 jon young ==> fail
4219 jon young orlando ==> fail
4219 jon young orlando fl ==> Bingo, valid address match.
