2

Similar questions have been asked here and here. However, I can't seem to get them to work for me.

If I have a character vector like:

myString <- c("5", "10", "100\abc\nx1\n1")

I want to remove everything after (and including) the first backslash. For example, my expected result would be:

>myString
"5" "10" "100"

I have tried using sub, gsub, and strsplit but I just can't seem to get it to work. Things I've tried:

gsub("\\\\*", "", myString)
sub("\\\\.*", "", myString)
gsub('\\"', "", myString, fixed = TRUE)
gsub("\\.*","", myString)

But I'm not great with regex stuff so I'm almost definitely not using these functions correctly! Any advice as to how I'd fix this?

Electrino
  • 2,636
  • 3
  • 18
  • 40
  • 1
    Your `myString` does not contain a `\ `. If it should contain a `\ ` you can use e.g.: `myString <- c("5", "10", "100\\abc\\nx1\\n1")` – GKi Sep 01 '21 at 15:45
  • Explanation of why this is happening: https://stackoverflow.com/questions/25424382/replace-single-backslash-in-r – Skaqqs Sep 01 '21 at 15:45
  • Ah ok... So by my understanding, what I really have in `myString[3]` is separate characters like "100", "\abc", "\nx1", and "\n1". The issue is, Im extracting a giant string from a model output and trying to separate it. So the single backslashes are part of the string I extract – Electrino Sep 01 '21 at 15:57

3 Answers3

2

Here is another way you could try:

gsub("(^\\d+)([\a-zA-Z0-9]*)", "\\1", myString)

[1] "5"   "10"  "100"
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
1

Using the information from @Skaqqs, it led me to something helpful by @bartektartanus. It's not base R unfortunately, but I think this should work using the stringi package to escape the uniciode

library(stringi)
myString <- c("5", "10", "100\abc\nx1\n1")
gsub("\\\\.*", "", stri_escape_unicode(myString))

result:

 "5"   "10"  "100"
Silentdevildoll
  • 1,187
  • 1
  • 6
  • 12
1

We could use parse_number

readr::parse_number(myString)
[1]   5  10 100
akrun
  • 874,273
  • 37
  • 540
  • 662