A regex to remove the pattern "[0-9]g"

Question

I have the following sample dataset:

XYZ 185g
ABC 60G
Gha 20g

How do I remove the strings "185g", "60G", "20g" without accidentally removing the alphabets g and G in the main words? I tried the below code but it replaces the alphabets in the main words as well.

a <- str_replace_all(a$words,"[0-9]"," ")
a <- str_replace_all(a$words,"[gG]"," ")

score 10 · Accepted Answer · answered Sep 06 '21 at 11:40

You need to combine them into something like

a$words <- str_replace_all(a$words,"\\s*\\d+[gG]$", "")

The \s*\d+[gG]$ regex matches

\s* - zero or more whitespaces
\d+ - one or more digits
[gG] - g or G
$ - end of string.

If you can have these strings inside a string, not just at the end, you may use

a$words <- str_replace_all(a$words,"\\s*\\d+[gG]\\b", "")

where $ is replaced with a \b, a word boundary.

To ignore case,

a$words <- str_replace_all(a$words, regex("\\s*\\d+g\\b", ignore_case=TRUE), "")

score 3 · Answer 2 · answered Sep 06 '21 at 12:19

3

You can try

> gsub("\\s\\d+g$", "", c("XYZ 185g", "ABC 60G", "Gha 20g"), ignore.case = TRUE)
[1] "XYZ" "ABC" "Gha"

answered Sep 06 '21 at 12:19

ThomasIsCoding

96,636
9
24
81

1

Nice to see you again my friend :) – Anoushiravan R Sep 06 '21 at 22:15

score 3 · Answer 3 · answered Sep 06 '21 at 22:14

3

You can also use the following solution:

vec <- c("XYZ 185g", "ABC 60G", "Gha 20g")

gsub("[A-Za-z]+(*SKIP)(*FAIL)|[ 0-9Gg]+", "", vec, perl = TRUE)

[1] "XYZ" "ABC" "Gha"

answered Sep 06 '21 at 22:14

Anoushiravan R

21,622
3
18
41

1

What is this `*SKIP`/`*FAIL` part? – Martin Gal Sep 06 '21 at 22:22
2

You can find detailed explanation here: https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex and also https://stackoverflow.com/questions/68239696/how-to-only-remove-single-parenthesis-and-keep-the-paired-ones – Anoushiravan R Sep 06 '21 at 22:29
1

and also this: https://stackoverflow.com/questions/19992984/verbs-that-act-after-backtracking-and-failure/20008790#20008790 the explanations are so detailed and good that I just leave it to the hands of true masters of regex including dear Wiktor Stribiziew . – Anoushiravan R Sep 06 '21 at 22:31
1

The `[A-Za-z]+(*SKIP)(*FAIL)|[ 0-9Gg]+` works for the current input. It matches one or more ASCII letters (with `[A-Za-z]+`) first, and once consumed, the match is failed, the regex index remains after the last letter consumed, and the new match search starts right there. That is why `[ 0-9Gg]+` only starts matching with a space or digit, but then it can match any one or more digits, spaces, or `g`/`G`. [This](https://regex101.com/r/mUYAFS/1) is a demo of what the regex does. – Wiktor Stribiżew Sep 07 '21 at 06:51
@WiktorStribiżew Thank you very much for your explanation. So I guess if I used a pattern like `\\s+\\d+[gG]\\b` just like you second solution for the alternative it may solve the problem. – Anoushiravan R Sep 07 '21 at 10:14
1

@AnoushiravanR Sure, if there is any problem :) – Wiktor Stribiżew Sep 07 '21 at 10:19

A regex to remove the pattern "[0-9]g"

3 Answers3