Try the function extract
from tidyr
(part of the tidyverse
):
library(tidyverse)
df %>%
extract(movies_name,
into = c("title", "year"),
regex = "(\\D+)\\s\\((\\d+)\\)")
title year
1 City of Lost Children, The (Cité des enfants perdus, La) 1995
2 another film 2020
How the regex works:
(\\D+)
: first capture group, matching one or more characters that are not digits
\\s\\(
: a whitespace and an opening parenthesis (not captured)
(\\d+)
: second capture group, matching one or more `dìgits
\\)
: closing bracket (not captured)
Data 1:
df <- data.frame(
movies_name = c("City of Lost Children, The (Cité des enfants perdus, La) (1995)",
"another film (2020)")
)
EDIT:
Okay, following comment, let's make this a little more complex by including a title with digits (in the title!):
Data 2:
df <- data.frame(
movies_name = c("City of Lost Children, The (Cité des enfants perdus, La) (1995)",
"another film (2020)",
"Under Siege 2: Dark Territory (1995)")
)
Solution - actually easier than the previous one ;)
df %>%
extract(movies_name,
into = c("title", "year"),
regex = "(.+)\\s\\((\\d+)\\)")
title year
1 City of Lost Children, The (Cité des enfants perdus, La) 1995
2 another film 2020
3 Under Siege 2: Dark Territory 1995