I found a few questions heading in this direction, but I could not apply the solutions to my specific problem: I have a quite a messy column of a dataframe with addresses. This means, there can be empty cells, numbers, numbers and text combined - and there can be one or more special characters in between.
In a first step, I want to split all values at the first special character. I tried various options that work partially. However, the problem seems to be that some cells don't contain any special characters - causing an error in the function.
For example, the following code puts only the special character in the new column b
, but does not really split the columns:
df <- df %>%
separate(address, into = c("a", "b"), sep = "[^[:punct:]]+", remove = FALSE)
So, what ideally I want to achieve is the following: If there is a special character in the cell, split it at the first special character, everything left of the first special character in column a
, everything right in column b
. If there is no special character, put the whole thing in column a
and NA
in column b
.
Do I have to wrap my code in an ifelse
-statement? Or are there any other suggestions?
Thanks!
Edit: as requested, some sample data:
library(dplyr)
test <- as.data.frame(c("2", "97/7", "17/7-8", "7E", "800E/7", "17", "", "0", "2/15", "17+18", "17/7/8", "19", "2/2/4", "9-7/8")) %>%
rename(address = 1)