15

So I have the following data, let's say called "my_data":

Storm.Type
TYPHOON
SEVERE STORM
TROPICAL STORM
SNOWSTORM AND HIGH WINDS

What I want is to classify whether or not each element in my_data$Storm.Type is a storm, BUT I don't want to include tropical storms as storms (I'm going to classify them separately), such that I would have

Storm.Type                    Is.Storm
TYPHOON                       0
SEVERE STORM                  1
TROPICAL STORM                0
SNOWSTORM AND HIGH WINDS      1

I have written the following code:

my_data$Is.Storm  <-  my_data[grep("(?<!TROPICAL) (?i)STORM"), "Storm.Type"]

But this only returns the "SEVERE STORM" as a storm (but leaves out SNOWSTORM AND HIGH WINDS). Thank you!

Jonathan Charlton
  • 1,975
  • 6
  • 23
  • 30
  • What's the point of `(?i)` in your regexp? The problem is that you're looking for the string `" STORM"` with a preceding space, so `"SNOWSTORM"` does not qualify. – Blue Magister Nov 22 '13 at 20:58
  • 1
    Hi Blue. While I accepted Ben's answer, you've actually gotten to the heart of the problem with MY code. I'd like to make my code so that it doesn't care about that space (so if STORM is in THUNDERSTORM, or SNOWSTORM, I want that as well as STORM on its own). Do you know how I'd get rid of that space that my code is looking for? The point of the (?i) is the off-chance that someone entered a STORM as "storm" or "Storm" or "sToRm", etc. – Jonathan Charlton Nov 22 '13 at 21:01

3 Answers3

11

The problem is that you're looking for the string " STORM" with a preceding space, so "SNOWSTORM" does not qualify.

As a fix, consider moving the space into your negative lookbehind assertion, like so:

ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS",
        "THUNDERSTORM")
grep("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] 2 4 5
grepl("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] FALSE  TRUE FALSE  TRUE  TRUE

I didn't know that (?i) and (?-i) set whether you ignore case or not in regex. Cool find. Another way to do it is the ignore.case flag:

grepl("(?<!tropical )storm", ss, perl = TRUE, ignore.case = TRUE)
# [1] FALSE  TRUE FALSE  TRUE  TRUE

Then define your column:

my_data$Is.Storm  <-  grepl("(?<!tropical )storm", my_data$Storm.Type,
                            perl = TRUE, ignore.case = TRUE)
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
  • 1
    You're welcome. Thanks for teaching me about `(?i)`. I'm used to using the `ignore.case` argument, but `(?i)` is more flexible for general PCRE expressions. – Blue Magister Nov 22 '13 at 21:11
3

I'm not that good at regexps either, but what's wrong with

ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS")
grepl("STORM",ss) & !grepl("TROPICAL STORM",ss)
## [1] FALSE  TRUE FALSE  TRUE

... ?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
0

something like

x <- my_data$Storm.Type
grep("STORM", x)[!grep("STORM", x)%in%grep("TROPICAL", x)]
ndr
  • 1,427
  • 10
  • 11