3

I would like to detect all time like strings in a webpage, and then use strtotime() in php to get unix time stamps. Is there a way to detect time like strings using php. I could use regex for a particular page, but I am seeking something universal or at least something that detects most of the possible formats of time/date strings? Thanks for reading this.

this is nice, but limited Matching a time string with a regular expression

Community
  • 1
  • 1
sam
  • 193
  • 2
  • 12
  • 1
    Will this be including times like "Yesterday", "Today", and "Next year" or just numeric representations? I'm also interested in hearing why you are doing this, there may be better solutions if you are open to them (unless this question is simply an exercise?). – Wesley Murch Apr 17 '11 at 14:35
  • Thanks for your reply. I am indexing a group of websites, some of them are forums. I am thinking if i can get all the time strings, put them in an array, sort for the max value, it gives me a reliable indication if the page has been updated and when. No it would not include today, tomorrow, or yesterday, something more structured than that. – sam Apr 17 '11 at 14:42
  • Wouldn't perhaps be less expensive to make a sha1 with the whole string of the html of the pages every day and compare them against the latest sha1 you have saved? – AJJ Apr 17 '11 at 14:51
  • That definitely tells me it has changed, but when? – sam Apr 17 '11 at 14:53

2 Answers2

2

Similar question here:

How to convert String to Date without knowing the format?

The consensus is that you need to know the incoming format. You also could attempt to match the incoming string against a discreet list of known formats first in attempt to determine the format. You hinted at this with mentioning regex in your question. Those are really the only two ways.

Community
  • 1
  • 1
AJ.
  • 27,586
  • 18
  • 84
  • 94
  • So i will need to develop an array of possible time/date formats i am expecting, and use regex to detect them, and then strtotime() to convert to unix. I guess the array of time/date formats should not be too big, can develop it manually! Wish there was a more refined solution. – sam Apr 17 '11 at 14:48
  • The approaches used here might be helpful: https://github.com/etiennetremel/PHP-Find-Date-in-String – user2761030 Oct 02 '14 at 12:07
1

You could try looking at the underlying implementation of strtotime() itself, and see how that's done - might give you some ideas.

Chris Tonkinson
  • 13,823
  • 14
  • 58
  • 90
  • http://us.php.net/manual/en/datetime.formats.php Contains all possible formats recognized by php, I guess i can start here to make a list of possible strings to look for in the html. Includes all the regex as well. Thanks for pointing in this direction – sam Apr 17 '11 at 14:59