0

Is a feature has a match for a regex, I would like to use the value of the match to populate a new feature, else NA.

I found this post and tried to use the answer for my problem.

library(dplyr)
library(stringr)

dat.p <- dat.p %>%
  mutate(
    cad = ifelse(str_locate(text_field, "\\[[^]]*\\]"), 
                 str_extract(text_field, "\\[[^]]*\\]"),
                 NA)
    )

Where if there's a match for regex \\[[^]]*\\] within text_field use that value in new column cad, else make the value of cad NA.

When I run it I get error:

Error: wrong result size (1000000), expected 500000 or 1

How do I do this?

Some example data:

df <- data.frame(
  id = 1:2,
  sometext = c("[cad] apples", "bannanas")
)

df.desired <- data.frame(
  id = 1:2,
  sometext = c("[cad] apples", "bannanas"),
  cad = c("[cad]", NA)
)
Andrie
  • 176,377
  • 47
  • 447
  • 496
Doug Fir
  • 19,971
  • 47
  • 169
  • 299

2 Answers2

3

I don't know why you bother with mutate and an ifelse when its a one liner using the fact that str_extract will give you an NA if it extracts nothing:

> df$cad = str_extract(df$sometext,"\\[[^]]*\\]")
> df
  id     sometext   cad
1  1 [cad] apples [cad]
2  2     bannanas  <NA>

You can debug R by trying expressions individually and seeing what happens. For example, the first element to your ifelse is this:

> str_locate(df$sometext,"\\[[^]]*\\]")
     start end
[1,]     1   5
[2,]    NA  NA

which is clearly not going to work as the first argument of an ifelse. So why did you think it did?

Spacedman
  • 92,590
  • 12
  • 140
  • 224
2
> df$cad <- regmatches(df$sometext, gregexpr("\\[[^]]*\\]", df$sometext))
> df
  id     sometext   cad
1  1 [cad] apples [cad]
2  2     bannanas 
Prasanna Nandakumar
  • 4,295
  • 34
  • 63