I'm building a twitter bot that will listen for tweets like the following:
Hey @twitterbot, I'm looking for restaurants around 123 Main Street, New York
or, another example:
@twitterbot, what's near Yonge & Dundas, Toronto? I'm hungry!
It'll then reply with the kind of data you'd expect these questions to return. I've got most of the problem solved, but I'm stuck on something that shouldn't be so hard; extracting the address from the tweet.
I'll be forwarding the address to a geocoding service to get lat/lng, so I don't need to format or prepare the address in any way; I just need to isolate it from unrelated text like "I'm looking for restaurants around" or "I'm hungry!".
Are there any NLP tools that will perform this address-identification within a block of text? Any suggestions for another way to go about it? Because Google's geocoder handles such a wide array of address formats (even a point of interest like 'The eaton centre, Toronto' counts as an address), I can't use regex to pluck the address out.
Phrased another way, I just want to remove any text that is not part of an address.
The addresses I'm looking for need to work for US/Canada.
There are some similar questions on StackOverflow but none that tackle this exact problem that I could find. Because Google's geocoder is so forgiving, the solution doesn't have to be perfect, it just needs to get rid of enough of the fuzz so that Google knows what I'm trying to say.
I'm very new to NLP so I'd appreciate any guidance on the subject.