0

apologies if this is a simple solution I'm still new to R

I have a dataset that has some columns containing both strings and numbers (https://i.stack.imgur.com/MjzvI.png)

I'd like to convert all of the "-" and "&" string values into -998 and -999 respectively as numeric values but cannot find a solution that achieves this

I've tried doing

df[df=="-"] = -998
df[df=="&"] = -999

but I receive "Error in vec_equal(): ! Can't combine ..1 and ..2 ."

I've also tried putting "-998" into quotes thinking I'd be able to convert it into numeric from there but still received the same error, same thing for using the "which" function

Thomas
  • 3
  • 2
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064) – MrFlick Mar 30 '23 at 17:08
  • Thanks for the info! I've not heard of a fixed match so will have to read more into that. As for the code however I'm unable to get your example to run, getting an error "Unexpected }" but am unable to figure out where the syntax error is – Thomas Mar 30 '23 at 17:18
  • Thanks for the clarification, that fixed the syntax error but it now throws "Error in replace(x, grepl("&", -999)) : argument "values" is missing, with no default" – Thomas Mar 30 '23 at 17:24
  • Ah sorry I'm unfamiliar with the functions you used so wasn't able to pick up on the typos, with the corrected code it presents a similar error to that from the beginning "Error in `[<-`: ! Can't convert `value` to ." – Thomas Mar 30 '23 at 17:33
  • @Thomas I assumed you had character columns. If there are numeric as well ,with the above code, just change it to character i.e. `df[] <- lapply(df, function(x) {x <- as.character(x); x <- replace(x, grepl("[-]", x), -998); replace(x, grepl("[&]", x), -999)}); df <- type.convert(df, as.is = TRUE)` – akrun Mar 30 '23 at 17:39

3 Answers3

0

In Tidyverse syntax, you could try

library(tidyverse)

df <- tibble(
  CITY_1 = c("&", "9", "-"),
  STATE_1 = c("57", "5&", "71")
)

df |> 
  mutate(across(everything(), \(x) if_else(str_detect(x, "&"), "-999", x))) |> 
  mutate(across(everything(), \(x) if_else(str_detect(x, "-"), "-998", x)))
#> # A tibble: 3 × 2
#>   CITY_1 STATE_1
#>   <chr>  <chr>  
#> 1 -998   57     
#> 2 9      -998   
#> 3 -998   71

Created on 2023-03-30 with reprex v2.0.2

dufei
  • 2,166
  • 1
  • 7
  • 18
0

Using stringi::stri_replace_all_regex with a sprintf to make things easier.

cols <- c("V1", "V2", "V3", "V4", "V5")
dat[cols] <- lapply(dat[cols], \(x) as.numeric(
  stringi::stri_replace_all_regex(x, 
                                  pattern=sprintf('.*%s.*', c('-', '&')),
                                  replacement=c(-998, -999), vectorize_all=FALSE)))
dat
#     V1 V2   V3 V4 V5
# 1    9  5   55  1  2
# 2    9  5   57  1  3
# 3    9  5 -999  1  5
# 4 -999  7   71  1  6
# 5 -998  7   71  1  6
# 6 -998  7 -999  1  6

Data:

dat <- read.table(text='
9 5 55 1 2
9 5 57 1 3
9 5 5& 1 5
& 7 71 1 6
- 7 71 1 6
- 7 5& 1 6
')
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

Another base R approach, using @jay.sf's starting dat:

dat[sapply(dat, grepl, pattern = "-")] <-  -998
dat[sapply(dat, grepl, pattern = "&")] <-  -999
dat
#     V1 V2   V3 V4 V5
# 1    9  5   55  1  2
# 2    9  5   57  1  3
# 3    9  5 -999  1  5
# 4 -999  7   71  1  6
# 5 -998  7   71  1  6
# 6 -998  7 -999  1  6

Or if you want one code path (perhaps you have more patterns to recode/replace),

ptns <- list("-"=-998, "&"=-999)
Reduce(function(X, i) {
  X[sapply(X, grepl, pattern = names(ptns)[i])] <- ptns[[i]]
  X
}, seq_along(ptns), init = dat)
#     V1 V2   V3 V4 V5
# 1    9  5   55  1  2
# 2    9  5   57  1  3
# 3    9  5 -999  1  5
# 4 -999  7   71  1  6
# 5 -998  7   71  1  6
# 6 -998  7 -999  1  6

In both cases, if your patterns ever contain regex special characters (including but not limited to ., ?, *, [, (), you'll need to escape them, perhaps using stringr::str_escape.

r2evans
  • 141,215
  • 6
  • 77
  • 149