2

separate_rows generates quotes (") around the resultant values after the operation. Is it a normal behaviour? How to prevent it within the same operation without explicitly removing them after the operation?

df <- data.frame(a = c("c_1", "c_2", "c_3", "c_4", "c_5"), 
                 b = c("a (+1)", "b (+2)", "a (+2), c (+5)", "e (+2)", "b (+2), e (+5)")) 

    a              b
1 c_1         a (+1)
2 c_2         b (+2)
3 c_3 a (+2), c (+5)
4 c_4         e (+2)
5 c_5 b (+2), e (+5)

df %>%  tidyr::separate_rows(b, sep = ",", convert = TRUE)
# # A tibble: 7 x 2
#     a     b        
#   <chr> <chr>    
# 1 c_1   "a (+1)" 
# 2 c_2   "b (+2)" 
# 3 c_3   "a (+2)" 
# 4 c_3   " c (+5)"
# 5 c_4   "e (+2)" 
# 6 c_5   "b (+2)" 
# 7 c_5   " e (+5)"

The question is not about splitting one row into multiple rows. It is already shown in my attempt and the code could achieve it.

Prradep
  • 5,506
  • 5
  • 43
  • 84

3 Answers3

7

Those quotes are not there as you think, it is just how tidyverse is printing - trying to show that there is a whitespace. See below:

library(tidyverse)

x1 <- df %>% separate_rows(b, sep = ",", convert = TRUE)
x2 <- as.data.frame(x1)

x1
# # A tibble: 7 x 2
#   a     b        
#   <chr> <chr>    
# 1 c_1   "a (+1)" 
# 2 c_2   "b (+2)" 
# 3 c_3   "a (+2)" 
# 4 c_3   " c (+5)"
# 5 c_4   "e (+2)" 
# 6 c_5   "b (+2)" 
# 7 c_5   " e (+5)"

x2
#     a       b
# 1 c_1  a (+1)
# 2 c_2  b (+2)
# 3 c_3  a (+2)
# 4 c_3  c (+5)
# 5 c_4  e (+2)
# 6 c_5  b (+2)
# 7 c_5  e (+5)

identical(x1$b, x2$b)
# [1] TRUE
zx8754
  • 52,746
  • 12
  • 114
  • 209
2

Add whitespace after comma in sep :

tidyr::separate_rows(df, b, sep = ",\\s", convert = TRUE)

#  a     b     
#  <chr> <chr> 
#1 c_1   a (+1)
#2 c_2   b (+2)
#3 c_3   a (+2)
#4 c_3   c (+5)
#5 c_4   e (+2)
#6 c_5   b (+2)
#7 c_5   e (+5)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks for the answer and I could accept after 10min. could you suggest if it possible to extract the number with the plus sign in the operation? – Prradep Sep 07 '20 at 10:54
  • 1
    @Prradep avoid asking new questions in the comments, post as a new question. – zx8754 Sep 07 '20 at 10:58
0

Here is a data.table option

setDT(df)
df[,strsplit(b,", "), by = a]

giving

    a     V1
1: c_1 a (+1)
2: c_2 b (+2)
3: c_3 a (+2)
4: c_3 c (+5)
5: c_4 e (+2)
6: c_5 b (+2)
7: c_5 e (+5)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81