The error appears because you have some values in the input vector that do not contain letters (and some symbols that [A-z]
matches). That makes regmatches
return no value in case there is no match, and thus, assigning the column values becomes impossible as the number of matches does not coincide with the number of rows in the data frame.
What you may do is:
1) Use sub
x <- c("------", "CHELSEAFC17FEB640CE", "BARCAFC17FEB1400CE")
> sub("^([a-zA-Z]+).*|.*", "\\1", df$x)
[1] "" "CHELSEAFC" "BARCAFC"
>
x$symbol <- sub("^([a-zA-Z]+).*|.*", "\\1", x$symbol)
The ^([a-zA-Z]+).*|.*
pattern will match and capture one or more ASCII letters (replace [a-zA-Z]+
with [[:alpha:]]+
to match letters other than ASCII, too) at the start of the string (^
), and .*
will match the rest of the string, OR (|
) the whole string will get matches with the second branch and the match will be replaced with the capturing group contents (so, it will be either filled with a letter value or will be empty).
2) If you want to keep NA for the values with no match, use stringr str_extract
:
library(stringr)
> x$symbol <- str_extract(x$symbol, "^[A-Za-z]+")
## => 1 <NA>
## 2 CHELSEAFC
## 3 BARCAFC
Note that ^[A-Za-z]+
matches 1+ ASCII letters ([A-Za-z]+
) at the start of the string only (^
).