7

I have the following sample dataset:

XYZ 185g
ABC 60G
Gha 20g

How do I remove the strings "185g", "60G", "20g" without accidentally removing the alphabets g and G in the main words? I tried the below code but it replaces the alphabets in the main words as well.

a <- str_replace_all(a$words,"[0-9]"," ")
a <- str_replace_all(a$words,"[gG]"," ")
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Shalvaze
  • 73
  • 3

3 Answers3

10

You need to combine them into something like

a$words <- str_replace_all(a$words,"\\s*\\d+[gG]$", "")

The \s*\d+[gG]$ regex matches

  • \s* - zero or more whitespaces
  • \d+ - one or more digits
  • [gG] - g or G
  • $ - end of string.

If you can have these strings inside a string, not just at the end, you may use

a$words <- str_replace_all(a$words,"\\s*\\d+[gG]\\b", "")

where $ is replaced with a \b, a word boundary.

To ignore case,

a$words <- str_replace_all(a$words, regex("\\s*\\d+g\\b", ignore_case=TRUE), "")
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
3

You can try

> gsub("\\s\\d+g$", "", c("XYZ 185g", "ABC 60G", "Gha 20g"), ignore.case = TRUE)
[1] "XYZ" "ABC" "Gha"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
3

You can also use the following solution:

vec <- c("XYZ 185g", "ABC 60G", "Gha 20g")

gsub("[A-Za-z]+(*SKIP)(*FAIL)|[ 0-9Gg]+", "", vec, perl = TRUE)

[1] "XYZ" "ABC" "Gha"
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    What is this `*SKIP`/`*FAIL` part? – Martin Gal Sep 06 '21 at 22:22
  • 2
    You can find detailed explanation here: https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex and also https://stackoverflow.com/questions/68239696/how-to-only-remove-single-parenthesis-and-keep-the-paired-ones – Anoushiravan R Sep 06 '21 at 22:29
  • 1
    and also this: https://stackoverflow.com/questions/19992984/verbs-that-act-after-backtracking-and-failure/20008790#20008790 the explanations are so detailed and good that I just leave it to the hands of true masters of regex including dear Wiktor Stribiziew . – Anoushiravan R Sep 06 '21 at 22:31
  • 1
    The `[A-Za-z]+(*SKIP)(*FAIL)|[ 0-9Gg]+` works for the current input. It matches one or more ASCII letters (with `[A-Za-z]+`) first, and once consumed, the match is failed, the regex index remains after the last letter consumed, and the new match search starts right there. That is why `[ 0-9Gg]+` only starts matching with a space or digit, but then it can match any one or more digits, spaces, or `g`/`G`. [This](https://regex101.com/r/mUYAFS/1) is a demo of what the regex does. – Wiktor Stribiżew Sep 07 '21 at 06:51
  • @WiktorStribiżew Thank you very much for your explanation. So I guess if I used a pattern like `\\s+\\d+[gG]\\b` just like you second solution for the alternative it may solve the problem. – Anoushiravan R Sep 07 '21 at 10:14
  • 1
    @AnoushiravanR Sure, if there is any problem :) – Wiktor Stribiżew Sep 07 '21 at 10:19