I wanted to know how to split columns indicated delimiter but also a position of it. I need to separate title of the film and the common delimiter is "(", but obviously some movies have brackets in their title as well, soI wanted to indicate that the bracket should be followed by a number, but the number itself shouldn't be used as separator.
Here is the code:
imdb_ratings <- imdb_ratings %>% separate(col = title, into = c("title", "year"),
sep = "\\(*[:digit:]")
It obviously throws an error, that all the values in a year column is NA. I already know, that my code tries to use the bracket and a number as a separators ( I guess you can have only one character), but I don't know, how to indicate where the bracket should be. I tried to use smth like this "\\(?=[:digit:]"
, but it also doesn't work.
[UPDATE]
Here is my code now:
imdb_ratings <- imdb_ratings %>% filter(Animation == 1 & !str_detect(title, "\\$")) %>%
separate(col = title, into = c("title", "year"),
sep = "\\((?=\\d)")
I wanted to filter out the rows that end with backslash, because I know that they don't have a year, that's why I used the code !str_detect(title, "\\$")
, but it doesn't work, because after I filtered it, the results come with the same rows that have the backslash at the end:
[![enter image description here][1]][1]
[UPDATE2] How to use separate function in order to get the year of the movie in the second column in cases where after a bracket there is not a year but some string character. On the screenshot you can see an example "Aladdin (Video game 1993)" What to do in order to separate the Aladdin in first column and 1993 in the second year column? Maybe option would be to get the Video game within brackets in the first column as well.
[![enter image description here][2]][2]
[UPDATE] The regex string was working all the time, but now suddenly R gives error over it.
The code was not changed:
imdb <- imdb %>% extract(title, c("title", "year"),
"^(.*?)(?:\s*\([^()]*?(\d{4})[^()]*\))?$")
the error: Error in drop && length(x) == 1L : invalid 'x' type in 'x && y'