I don't think this needs a lot of machine learning as such. A bit of NLP is helpful to get the dependencies from the sentence but even that isn't strictly necessary.
You could start off with just looking at keywords monday
,tuesday
etc. and then do a look around to see what is around them last monday
, next monday
, coming monday
, previous monday
and so on. These are called window features because they provide a window +/- 1,2,3 ...
around the feature you are interested in monday
. The around 5pm
you could theoretically also get from just looking at window features, I don't have an intuition as to how noisy that would be. Try to think of all the ways of expressing time in that context and then think of those ways can be mixed up with something else. Of the top of my head it would seem relatively easy to do that.
Anyhow, the other way is to use a dependency parser to extract the grammatical relations of the elements in the sentence. This requires you to Part of Speech (POS) tag the sentence (after splitting it into tokens). The POS tagger would need to be trained to recognize that friday
and monday
are nouns, perhaps even that they are temporal expressions, same goes for 5pm
and around 5pm
. That does require machine learning and a lot of it. The benefit Google has as opposed to others is that they have a lot of data, which allows them to have lots and lots and lots of examples of different ways expressing what essentially is the same thing. This gives their models a lot of breadth. Once you've got the sentence POS tagged, you feed it to a dependency parser (such as the Stanford Dependency Parser) which tells you what the relation between all the different tokens in the sentence is.
Again Google has a lot of data which helps. On top of all this Google has had years to hone the output of the models so that when the models isn't entirely sure what is going on it won't highlight/extract the result. In terms of actually applying NLP in the real world this last step is very important because it given people confidence in what the system is doing. Basically if the software isn't sure what is happening do nothing, because doing something risks doing the wrong thing which then reduces people's confidence on the system as a whole.
Releasing a reliable easy to use NLP application requires a tradeoff between the quality of the NLP/Machine Learning and general software engineering to hide all the parts where the NLP fails from the users.
Try sending yourself email(s) with time expressed in different ways and see which ones Google gets and which ones it doesn't. For instance
- Can we meet Friday next week?
- How about coffee next week's Friday at 2pm
- I can't do Friday but I can meet Wednesday at 4pm
and so on, it's always interesting to poke holes in technology. It can also reveal quite a lot about what it is doing, and how it is doing it.