1
MarkerName Allele1 Allele2 Weight Zscore P-value Direction
10:1167075 a g 218.00 2.446 0.01446 ?+
7:77652992 t c 218.00 2.076 0.03789 ?-
X:24811075 a g 315.00 2.463 0.01378 +?
4:15645706 t c 315.00 2.582 0.009817 -?
5:13478320 g a 315.00 2.872 0.00222 ++

I am trying to subset a data frame with this format to remove all rows that contain a ?. The issue that I am running into is that the +/- signs are being recognized as operator symbols and R is giving me the following error:

Error: invalid regular expression '?+', reason 'Invalid use of repetition operators'

My goal is to have a data frame that looks like this:

MarkerName Allele1 Allele2 Weight Zscore P-value Direction
5:13478320 g a 315.00 2.872 0.002 ++
Ava Wilson
  • 33
  • 5
  • `?` is a reserved character in regex, meaning the previous character/group is optional (0 or 1). You can escape it `"\\?"` or use `fixed=TRUE`. https://stackoverflow.com/a/22944075/3358272 is a good reference for things like that. – r2evans Nov 02 '22 at 17:55

1 Answers1

2

This should do:

df %>% filter(str_detect(Direction,'\\?', negate = T))

Example:

d = tibble(a = 1:3, b = c('+?', '?+', '++'))

      a b    
  <int> <chr>
1     1 +?   
2     2 ?+   
3     3 ++ 


d %>% filter(str_detect(b,'\\?', negate = T))


      a b    
  <int> <chr>
1     3 ++
Juan C
  • 5,846
  • 2
  • 17
  • 51