0

I would like to know how to replace strings based on different conditions then group them together with dplyr in a dataset.
For example,

Discription on how I want to extract from the given dataset

The reason I treat FRAUD and NARC differently is that I think there is a difference between NARC-SELL and NARC-POSSES (the kinds of drugs that are involved are not important).
Thanks for the help!

Chloe
  • 55
  • 7

2 Answers2

3

You'll want to use a regex string like NARC-[A-Z]*|FRAUD: NARC followed by a dash followed by a string of capital letters, or FRAUD.

library(dplyr)
d <- data.frame(x = c("FRAUD-CREDIT CARD",
                      "HOMICIDE-JUST-GUN",
                      "NARC-POSSESS-PILL/TABLET",
                      "NARC-SELL-HEROIN"))
d %>%
  mutate(y = gsub("^(NARC-[A-Z]+|FRAUD).*", "\\1",  x))
#                          x                 y
# 1        FRAUD-CREDIT CARD             FRAUD
# 2        HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
# 3 NARC-POSSESS-PILL/TABLET      NARC-POSSESS
# 4         NARC-SELL-HEROIN         NARC-SELL
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
0

You can also use str_extract(), from stringr:

# using Weihuang Wong's nice example data

library(dplyr)
library(stringr)

d <- data.frame(x = c("FRAUD-CREDIT CARD",
                      "HOMICIDE-JUST-GUN",
                      "NARC-POSSESS-PILL/TABLET",
                      "NARC-SELL-HEROIN"))

pattern <- "^(NARC-\\w+|FRAUD|HOMICIDE-\\w+-\\w+)"

d %>% mutate(y = str_extract(x, pattern))

                         x                 y
1        FRAUD-CREDIT CARD             FRAUD
2        HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
3 NARC-POSSESS-PILL/TABLET      NARC-POSSESS
4         NARC-SELL-HEROIN         NARC-SELL
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • What does " \\ " and " w+- " mean after the " HOMICIDE- " ? – Chloe May 28 '18 at 14:48
  • `\w` is a [regex shorthand](https://stackoverflow.com/a/342977/2799941) for `[a-zA-Z0-9_]`. The extra \ is an escape character. The `+` means "one or more". The `-` is just a hyphen, no special meaning. You might find it useful to look into regex syntax, there are plenty of good references and tutorials online that will help you get more familiar with these patterns. – andrew_reece May 28 '18 at 16:02