Split rows by columns content in R

Question

I am struggling with a very simple thing. I have a data frame in which the column Comments contains information exclusive to single field observations (time in this case). For example:

data <- data.frame("Species" = c("TURPHI", "EMBHER", "ANTTRI"), 
                       "Date" = c("2020/06/03", "2020/06/03", "2020/06/03"), 
                       "Comments" = c("21:00;23:00;23:45", "22:01", "21:51"), 
                  stringsAsFactors = FALSE)
> data
  Species       Date          Comments
1  TURPHI 2020/06/03 21:00;23:00;23:45
2  EMBHER 2020/06/03             22:01
3  ANTTRI 2020/06/03             21:51

I have to split each row if it has more than 1 element in Comments column. Elements are delimited by ;. In the previous case, row 1 has to be splited into 3 rows, each with its time, such as:

> data
  Species       Date  Time
1  TURPHI 2020/06/03 21:00
2  TURPHI 2020/06/03 23:00
3  TURPHI 2020/06/03 23:45
4  EMBHER 2020/06/03 22:01
5  ANTTRI 2020/06/03 21:51

Thanks a lot!!

score 2 · Accepted Answer · answered Aug 17 '20 at 14:04

2

You can try this (only specify the number of column for splitting and you can also save in a new dataframe):

library(tidyverse)

df1 <- separate_rows(data,3,sep = ';')

Output:

# A tibble: 5 x 3
  Species Date       Comments
  <chr>   <chr>      <chr>   
1 TURPHI  2020/06/03 21:00   
2 TURPHI  2020/06/03 23:00   
3 TURPHI  2020/06/03 23:45   
4 EMBHER  2020/06/03 22:01   
5 ANTTRI  2020/06/03 21:51

answered Aug 17 '20 at 14:04

Duck

39,058
13
42
84

Maybe `library(tidyr)` is preciser to specify the package. – Darren Tsai Aug 17 '20 at 14:35
1

@DarrenTsai Oh yes you are right! But as tidyverse also loads `tidyr` I use `tidyverse` sorry for confusing you :) – Duck Aug 17 '20 at 14:36

GKi · Answer 2 · 2020-08-17T14:16:02.833

In base you can use strsplit to split data$Comments by ; and the combine the result with cbind and repeat the rows using rep.

x <- strsplit(data$Comments, ";", fixed = TRUE)
cbind(data[rep(seq_len(nrow(data)), lengths(x)),-3], time=unlist(x))
#    Species       Date  time
#1    TURPHI 2020/06/03 21:00
#1.1  TURPHI 2020/06/03 23:00
#1.2  TURPHI 2020/06/03 23:45
#2    EMBHER 2020/06/03 22:01
#3    ANTTRI 2020/06/03 21:51

Split rows by columns content in R

2 Answers2