Split column using regular expressions in R

Question

Im trying so split column in my dataframe into two columns. Values in column look like this:

column
user_author-5
creator-user-5

Desired result is this:

column            number
user_author         5
creator-user        7

I do this:

df %>%  
  tidyr::extract(col = "column", 
                 into = c("number"), 
                 regex = "-(\\d+)$", 
                 remove = FALSE
                 )

But i get this:

column            number
user_author-5       5
creator-user-7      7

How could i split column and remove that number from the first column at the same time? The problem here is that there are some "-" in text too, so I must use regular expression "-(\d+)$", not "-". It makes it a little bit unclear to me

`tidyr::separate(df, column,into = c('column', 'number'), sep = '-', convert = TRUE)` — Ronak Shah, Aug 10 '20 at 10:21
@RonakShah The problem here is that there are some "-" in text too, so I must use regular expression "-(\d+)$", not "-". It makes it a little bit unclear to me — french_fries, Aug 10 '20 at 10:25

Ronak Shah · Answer 1 · 2020-08-10T11:51:32.797

1

You can use extract like :

tidyr::extract(df, column, c('column', 'number'), '(.*)-.*?(\\d+)')
#        column number
#1  user_author      5
#2 creator-user      7

in regex we capture data in two groups. First group is till first '-' and the second group is the last number.

data

df <- structure(list(column = c("user_author-5", "creator-user-7")), 
class = "data.frame", row.names = c(NA, -2L))

edited Aug 10 '20 at 11:51

answered Aug 10 '20 at 10:26

Ronak Shah

377,200
20
156
213

but i want value in column "creator-user" not "creator". so it still separates text too. that was the unclear part – french_fries Aug 10 '20 at 10:31
`tidyr::extract(df, column, c('column', 'number'), '(.*)-.*(\\d+)')` – Edo Aug 10 '20 at 11:18

score 0 · Answer 2 · answered Aug 10 '20 at 13:16

0

Another way you can try in this case.

library(stringr)
df2 <- df %>% 
  mutate(colum2 = str_extract_all(column, regex("(?<=-)\\d{1,}$")))
#           column colum2
# 1  user_author-5      5
# 2 creator-user-7      7

answered Aug 10 '20 at 13:16

Tho Vu

1,304
2
8
20

Split column using regular expressions in R

2 Answers2