17

I want to filter out the rows of a table which contain '*' in the string value of the column. Checking just that column.

 string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")

 zz <- sapply(tx$variant_full_name, function(x) {substrRight(x, -1) =="*"})
 Error in FUN(c("Agno I30N", "VP2 E17Q", "VP2 I204*", "VP3 I85F", "VP1 K73R",  : 
   could not find function "substrRight"

The 4th value of zz should be TRUE by this.

in python there is endswith function for strings [ string_s.endswith('*') ] Is there something similar to that in R ?

Also, is it problem because of '*' as a character as it means any character ? grepl also not working.

> grepl("*^",'dddd*')
[1] TRUE
> grepl("*^",'dddd')
[1] TRUE
user2864740
  • 60,010
  • 15
  • 145
  • 220
Malaya
  • 191
  • 1
  • 1
  • 7
  • 2
    You can escape the `*` `grepl("\\*",'dddd*')`. To find strings that end with a `*` you can use `grepl("\\*$", string_name)` – jdharrison Oct 04 '14 at 01:11

5 Answers5

20

Base now contains startsWith and endsWith. Thus the OP's question can be answered with endsWith:

> string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
> endsWith(string_name, '*')
[1] FALSE FALSE FALSE  TRUE FALSE

This is much faster than substring(string_name, nchar(string_name)) == '*'.

Vidhya G
  • 2,250
  • 1
  • 25
  • 28
15

* is a quantifier in regular expressions. It tells the regular expression engine to attempt to match the preceding token "zero or more times". To match a literal, you need to precede it with two backslashes or place inside of a character class [*]. To check if the string ends with a specific pattern, use the end of string $ anchor.

> grepl('\\*$', c('aaaaa', 'bbbbb', 'ccccc', 'dddd*', 'eee*eee'))
# [1] FALSE FALSE FALSE  TRUE FALSE

You can simply do this without implementing a regular expression in base R:

> x <- c('aaaaa', 'bbbbb', 'ccccc', 'dddd*', 'eee*eee')
> substr(x, nchar(x)-1+1, nchar(x)) == '*'
# [1] FALSE FALSE FALSE  TRUE FALSE
hwnd
  • 69,796
  • 4
  • 95
  • 132
8

This is simple enough that you don't need regular expressions.

> string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
> substring(string_name, nchar(string_name)) == "*"
[1] FALSE FALSE FALSE  TRUE FALSE
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
5

I use something like this:

strEndsWith <- function(haystack, needle)
{
  hl <- nchar(haystack)
  nl <- nchar(needle)
  if(nl>hl)
  {
    return(F)
  } else
  {
    return(substr(haystack, hl-nl+1, hl) == needle)
  }
}
krokodil
  • 1,326
  • 10
  • 18
0

here is a tidyverse solution:

string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
str_sub(string_name, -1) == "*"
[1] FALSE FALSE FALSE  TRUE FALSE

It has the benefits of being much more readable and can also be changed easily if a different location needs to be checked.

leo
  • 415
  • 1
  • 5
  • 14