0

I have a dataset where some of the columns have special characters. I want to clean this dataset and remove these special characters from all the columns that have them. A subset of the column names is below:

Crorf64[,1]
ADF3
DF41c[,1]
AGGF3[,1]
SRGHJ

My desired output is:

Crorf64
ADF3
DF41c
AGGF3
SRGHJ

My attempt to remove the parts of the column names with the special characters is below, using this answer as a guide https://stackoverflow.com/a/37801926/17054028:

training%>%
  mutate(col=str_remove_all(col,"[.*"))

I get an error when I use this:

Error in `mutate()`:
! Problem while computing `col = str_remove_all(col, "[.*")`.
Caused by error in `stri_replace_all_regex()`:
! argument `str` should be a character vector (or an object coercible to)

Any other alternatives to perform this task are welcome.

thole
  • 117
  • 6
  • Escape the `[` `\\[.*` – GKi Apr 03 '23 at 09:31
  • @GKi I used it ```training%>% mutate(col=str_remove_all(col,"\\[.*"))``` but now I am geeting an error it says ```Error in `mutate()`: ! Problem while computing `col = str_remove_all(col, "\\[.*")`. Caused by error in `stri_replace_all_regex()`: ! argument `str` should be a character vector (or an object coercible to)``` – thole Apr 03 '23 at 09:35
  • Maybe try: `sub("\\[.*", "", c("Crorf64[,1]", "ADF3"))` ? – GKi Apr 03 '23 at 09:37
  • And `stringr::str_remove_all(c("Crorf64[,1]", "ADF3"), "\\[.*")` works also. – GKi Apr 03 '23 at 09:40

1 Answers1

0

I suspect it might be the same issue as in your other question so try preventing the creation of these special characters in the first place.

If they were created by a function that returns a matrix (such as scale()), note that [,1] is not actually part of the column name - it just indicates that the column is actually a matrix:

library(tidyverse)

df <- tibble(
  Crorf64 = 1,
  ADF3 = 2
)

# add brackets to column name
df <- df |> 
  mutate(across(1, scale))

df
#> # A tibble: 1 × 2
#>   Crorf64[,1]  ADF3
#>         <dbl> <dbl>
#> 1         NaN     2

df |> colnames()
#> [1] "Crorf64" "ADF3"

Created on 2023-04-03 with reprex v2.0.2

If you can't address this issue earlier in your pipeline, you can extract the column from the matrix after the fact like so:


df |> 
  mutate(across(1, \(x) x[,1]))
#> # A tibble: 1 × 2
#>   Crorf64  ADF3
#>     <dbl> <dbl>
#> 1     NaN     2
dufei
  • 2,166
  • 1
  • 7
  • 18