0

Im trying so split column in my dataframe into two columns. Values in column look like this:

column
user_author-5
creator-user-5

Desired result is this:

column            number
user_author         5
creator-user        7

I do this:

df %>%  
  tidyr::extract(col = "column", 
                 into = c("number"), 
                 regex = "-(\\d+)$", 
                 remove = FALSE
                 ) 

But i get this:

column            number
user_author-5       5
creator-user-7      7

How could i split column and remove that number from the first column at the same time? The problem here is that there are some "-" in text too, so I must use regular expression "-(\d+)$", not "-". It makes it a little bit unclear to me

french_fries
  • 1,149
  • 6
  • 22
  • `tidyr::separate(df, column,into = c('column', 'number'), sep = '-', convert = TRUE)` – Ronak Shah Aug 10 '20 at 10:21
  • @RonakShah The problem here is that there are some "-" in text too, so I must use regular expression "-(\d+)$", not "-". It makes it a little bit unclear to me – french_fries Aug 10 '20 at 10:25

2 Answers2

1

You can use extract like :

tidyr::extract(df, column, c('column', 'number'), '(.*)-.*?(\\d+)')
#        column number
#1  user_author      5
#2 creator-user      7

in regex we capture data in two groups. First group is till first '-' and the second group is the last number.

data

df <- structure(list(column = c("user_author-5", "creator-user-7")), 
class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Another way you can try in this case.

library(stringr)
df2 <- df %>% 
  mutate(colum2 = str_extract_all(column, regex("(?<=-)\\d{1,}$")))
#           column colum2
# 1  user_author-5      5
# 2 creator-user-7      7
Tho Vu
  • 1,304
  • 2
  • 8
  • 20