1

In my dataset have a column named duration. From it I want to split the hours and minutes into 2 separate columns. If either hours or minutes is not there want to add 0h or 0m accordingly.

Provided the same existing column details as well as the expected new columns in the below attached image:

train <- read.csv("sampledata.csv", stringsAsFactors = F)
train$Duration

enter image description here

Edit:

sampledata <- data.frame(
   emp_id = c (1:5), 
   Duration = c("10h 50m","5h 34m","9h","4h 15m","23m"),
   stringsAsFactors = FALSE
)

sampledata$Duration
prasanth
  • 483
  • 1
  • 4
  • 11
  • Why is this question getting down voted? Would people have some courtesy to point out the mistake – prasanth Mar 30 '19 at 15:42
  • Please edit the question as stated here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – NelsonGon Mar 30 '19 at 15:44
  • 3
    Downvoted for data as image and no effort to solve the problem. You can try to use `lubridate` package and ask others to help you when you have an actual code. – pogibas Mar 30 '19 at 15:46
  • Thanks for the pointers. Attached as image because wasn't aware how to display as text in table format here. I had it in csv file one column but didn't find a way to attach the csv file as part of the question. No worries thanks for your time!! – prasanth Mar 30 '19 at 15:57
  • would you always have only hours and minutes in the `Duration` ? Could it be seconds or something else ? Also do you need `h` and `m` in the final column or just numbers would do since your column already has names `hours` and `minutes` ? – Ronak Shah Mar 30 '19 at 16:10
  • @RonakShah that column would always have only hours and minutes. Yeah you are right in the final column numbers alone should suffice. – prasanth Mar 30 '19 at 16:18
  • Hey did you accidentally downvoted my below answer? – Ronak Shah Apr 07 '19 at 10:29

2 Answers2

1

Not the best of answer I would say but one way would be

#Get numbers next to hours and minutes
hour_minute <- sub("(\\d+)h (\\d+)m", "\\1-\\2", sampledata$Duration)

sampledata[c("hour", "minutes")] <- t(sapply(strsplit(hour_minute, "-"), 
function(x) {
  if (length(x) == 2) x 
  else if (endsWith(x, "h")) c(sub("h", "", x), 0)
  else c(0, sub("m", "", x))
}))

sampledata
  emp_id Duration hour minutes
1      1  10h 50m   10      50
2      2   5h 34m    5      34
3      3       9h    9       0
4      4   4h 15m    4      15
5      5      23m    0      23
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

A solution using sub() and gsub would be like this

# first identify strings with "h"
h_in_str <- grepl("h", sampledata$Duration)
# if string has "h", then return all before "h" or else return 0
sampledata$Hours <- ifelse(h_in_str, sub("h.*", "", sampledata$Duration), 0)

# identify strings with "m"
m_in_str <- grepl("m", sampledata$Duration)
# if string has "m", return all numbers without those preceding "h" or else return 0
sampledata$Minutes <- ifelse(m_in_str, 
gsub("([0-9]+).*$", "\\1", sub(".*h", "", sampledata$Duration)), 0)

This gives you the data you are looking for

sampledata
emp_id Duration Hours Minutes
1      1  10h 50m   10      50
2      2   5h 34m    5      34
3      3       9h    9       0
4      4   4h 15m    4      15
5      5      23m    0      23