4

I have a dataframe as follows

  1     Tertiary seen.
  2     No tertiary seen.
  3     No anything seen.
  4     Tertiary everywhere.

I want to add a column only when Tertiary is seen but not when the regex No.*\. is seen.

  1     Tertiary seen.        Tertiary
  2     No tertiary seen.       NA
  3     No anything seen.       NA
  4     Tertiary everywhere.  Tertiary 

I know I can use | in str_extract but & doesn't seem to be accepted as follows

Mydata$newcol<-str_extract(Mydata$Text,"[Tt]ertiary&!No.*[Tt]ertiary\\.")
Amanda
  • 12,099
  • 17
  • 63
  • 91
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125

2 Answers2

2

You can try a Negative lookebehind for that, something like

Mydata$newcol[grepl("(?!No )Tertiary", Mydata$Text, perl = TRUE)] <- "Tertiary"
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
1

"AND" pattern can be represented by "NOT (NOT A OR NOT B)" pattern. See also regex - Regular Expressions: Is there an AND operator? - Stack Overflow.

library(dplyr)
library(stringr)

Mydata <- data_frame(
  Text = c("Tertiary seen.",
           "No tertiary seen.",
           "No anything seen.",
           "Tertiary everywhere.")
  )

Mydata %>% 
  mutate(
    newcol = str_extract(Text, "^(^[Tt]ertiary|^No.*[Tt]ertiary\\.)")
  )
# A tibble: 4 × 2
# Text   newcol
# <chr>    <chr>
# 1       Tertiary seen. Tertiary
# 2    No tertiary seen.     <NA>
# 3    No anything seen.     <NA>
# 4 Tertiary everywhere. Tertiary
Community
  • 1
  • 1
Keiku
  • 8,205
  • 4
  • 41
  • 44