1

I have a data frame in R which has a column like this:

1-2hours
2-3minutes
1-1hours
3-2days
4-1months

I want to extract it into two columns: ID and duration

The duration column is a calculated column for example for the first row it should be 2 * 60 =120 and for the second column it should be 3 and for the third row, it should be: 1 * 60 =60 and so on.

The ID is what is appear before - and duration is a calculated value for example number of minutes or

My solution is to first extract ID, so then I have two columns:

ID    DurationTmp
1     2hours
2     3minutes
1     1hours
3     2days
4     1month

then using code like this to find what is each row:

df %>% mutate(duration_unit_minute=endsWith(DurationTmp, 'minutes'))-> df

do the above code for all types of duration and then use this code to extract minutes and put into one column:

df %>% mutate(duration_minute=ifelse(duration_unit_minutes, substr(DurationTmp,1,nchar(DurationTmp)-nchar('minutes')),0))-> df

This is too complex and possibly not the best way of doing this.

What is the best way of doing this?

Please note that the task is not just to extract the columns, but also to create a calculated column. The first part which is extraction is simple, but the second part which is calculation (converting minutes and day and .. into one standard calculated column) is the difficulty.

mans
  • 17,104
  • 45
  • 172
  • 321
  • @IanCampbell If the OP meant to create new columns from the split, the one you tagged may not work – akrun Mar 08 '22 at 18:22
  • @IanCampbell Can you please elaborate? how can I use Lubridate to achieve this? – mans Mar 08 '22 at 18:28

1 Answers1

2

This can easily be accomplished with tidyr::separate and lubridate::as.duration:

library(tidyverse);library(lurbidate)
data %>%
  tibble %>%
  separate(col = 1, into = c("ID","Duration"), sep = "-") %>% 
  mutate(Duration = as.duration(Duration))
## A tibble: 5 × 2
#  ID    Duration              
#  <chr> <Duration>            
#1 1     7200s (~2 hours)      
#2 2     180s (~3 minutes)     
#3 1     3600s (~1 hours)      
#4 3     172800s (~2 days)     
#5 4     2629800s (~4.35 weeks)

Sample data:

data <- c("1-2hours", "2-3minutes", "1-1hours", "3-2days", "4-1months")
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57