I have a dataset which has differing types of date formats and I need to standardize them.
URL | Last_Updated | Reviewed | Date_Found | Crawl_Date
URL.html | January 21, 2016 | April 11, 2016 | 2019-02-11T03:50:01Z/ | 2021-03-04 01:27:08
And secondly, I need to get a "max" field, but it has to prioritize the last_updated date field. I'm assuming it'll end up being some kind of if/then statement but I'm not sure how to proceed without a standardized date format for the 4 dates I have. Ultimately, I need to use this "max_date" to identify the number of days between the crawl_date and the last_updated date.
So far, I have this -
URLtoDate$date_diff <- as.Date(as.character(URLtoDate$crawl_date), format="%Y/%m/%d")- as.Date(as.character(URLtoDate$max.date), format="%Y/%m/%d")
But, as my dates aren't in the same format, I'm getting NAs across the dataset. Any help is greatly appreciated.