-1

I have two columns, one of start dates and times and one of end dates and times. I want to split those into four columns: Start_date, Start_time, End_date, End_time. They are separated by a space (each column is formatted such as "12/04/2017 05:25 PM"). Ultimately I need to find the difference between the start date and time and the end date and time. I am a beginner at R so I really appropriate your help.

  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Ronak Shah Jul 14 '18 at 05:20

1 Answers1

1

For the purposes of this question I am assuming that you are in the United States and thus that the example date which you provided refers to December 4th, 2017.

The first step is for you to convert the two date columns into dates instead of strings. The pattern of the elements in the datetime object must be echoed in the formatting command. Based on the example you provided, I have created a toy dataframe.

df <- data.frame(Start = c("12/04/2017 05:25 PM","05/05/2017 06:25 PM"), End = c("12/09/2018 05:15 PM","05/05/2019 06:24 PM"))
df
            Start                 End
1 12/04/2017 05:25 PM 12/09/2018 05:15 PM
2 05/05/2017 06:25 PM 05/05/2019 06:24 PM

Now to convert these strings into date objects:

library(lubridate)

df$Start <- strptime(df$Start,format="%m/%d/%Y %I:%M %p")
df$End <- strptime(df$End,format="%m/%d/%Y %I:%M %p")
df
            Start                 End
1 2017-12-04 17:25:00 2018-12-09 17:15:00
2 2017-05-05 18:25:00 2019-05-05 18:24:00

You will note that the spaces you indicated are included in the format pattern, along with symbols which indicate which parts of the date appear where as well as whether solo digits are padded with zeros (as all of yours appear to be). For a reference on which symbols/patterns to use in which situation, I recommend this page: https://www.stat.berkeley.edu/~s133/dates.html

If you wish to determine the difference between the two datetimes, it is now a simple matter of subtracting one from the other.

df$diff <- df$End - df$Start
            Start                 End          diff
1 2017-12-04 17:25:00 2018-12-09 17:15:00 369.9931 days
2 2017-05-05 18:25:00 2019-05-05 18:24:00 729.9993 days

In your question you asked about splitting up into pieces. Just in case that is still something that you need to do, creating the date time is still going to help you out. Now that we have datetime objects instead of strings, we can easily split the column into pieces.

df$Start_Day <- day(df$Start)
df$Start_Month<- month(df$Start)
df$Start_Year <- year(df$Start)

and so on.

RealViaCauchy
  • 237
  • 1
  • 10
  • Thank you so much this definitely helped me get started. I created a new data frame with the two columns that contained time and they formatted perfectly. However when I subtract them from each other I get wild answers. For example 2017-12-05 14:23:00 - 2017-12-05 23:20:00 gets a difference of 32220. I'm not sure what I'm missing. The code I used is time_dat$diff = time_dat$end - time_dat$start and I tried reformatting it but that doesn't seem to be working either. Thank you so much for taking the time to walk me through this. – Amelia O'Leary Jul 14 '18 at 15:14
  • Hi Amelia, Sorry I didn't check for your follow up. If `str(df)` does not show that `df$Start` and `df$End` are "POSIXlt" type, then something has gone amiss to prevent you from using simple subtraction. If they show that they are factors, then order to find the difference, you can use the function `difftime()` and choose what unit you would like the time to be in like so `difftime(df$End, df$Start, units = "hours")` – RealViaCauchy Jul 16 '18 at 17:57
  • I'm glad the solution was helpful. If it answered your question, an acceptance would be very appreciated. – RealViaCauchy Jul 16 '18 at 17:57