0

I am trying to do an exact pattern match using the gsub/sub and replace function. I am not getting the desired response. I am trying to remove the .x and .y from the names without affecting other names.

name = c("company", "deriv.x", "isConfirmed.y")
new.name = gsub(".x$|.y$", "", name)
new.name
[1] "compa"       "deriv"       "isConfirmed"

company has become compa.

I have also tried

remove = c(".x", ".y")
replace(name, name %in% remove, "")
[1] "company"    "deriv.x"    "isConfirmed.y"

I would like the outcome to be. "company", "deriv", "isConfirmed"

How do I solve this problem?

Emma
  • 27,428
  • 11
  • 44
  • 69
Jordan Wrong
  • 1,205
  • 1
  • 12
  • 32

3 Answers3

1

Here we can have a simple expression that removes the undesired . and anything after that:

(.+?)(?:\..+)?

or for exact match:

(.+?)(?:\.x|\.y)?

R Test

Your code might look like something similar to:

gsub("(.+?)(?:\\..+)?", "\\1", "deriv.x")

or

gsub("(.+?)(?:\.x|\.y)?", "\\1", "deriv.x")

R Demo

RegEx Demo 1

RegEx Demo 2

Description

Here, we are having a capturing group (.+?), where our desired output is and a non-capturing group (?:\..+)? which swipes everything after the undesired ..

Emma
  • 27,428
  • 11
  • 44
  • 69
1

The dot matches any character except a newline ao .x$|.y$ would also match the ny in company

There is no need for any grouping structure to match a dot followed by x or y. You could match a dot and match either x or y using a character class:

\\.[xy]

Regex demo | R demo

And replace with an empty string:

name = c("company", "deriv.x", "isConfirmed.y")
new.name = gsub("\\.[xy]", "", name)
new.name

Result

[1] "company"     "deriv"       "isConfirmed"
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

In a regex, . represents "any character". In order to recognize literal . characters, you need to escape the character, like so:

name <- c("company", "deriv.x", "isConfirmed.y")
new.name <- gsub("\\.x$|\\.y$", "", name)
new.name

[1] "company"     "deriv"       "isConfirmed"

This explains why in your original example, "company" was being transformed to "compa" (deleting the "any character of 'n', followed by a 'y' and end of string").

Onyambu's comment would also work, since within the [ ] portion of a regex, . is interpreted literally.

gsub("[.](x|y)$", "", name)
jdobres
  • 11,339
  • 1
  • 17
  • 37