How do I identify a string using a wildcard using R?

Question

I'd like to count the duration of some words containing the number "0" after a string, as shown in the picture (I inserted a red dot under the words with the character I'd like to work with). Is there anything like a wildcard so that I can work only with the words containing this string?

[Picture]

Matt, welcome to SO! It is very helpful if you post *usable* data instead of just an image of it. This doesn't mean you cannot include an image (for demonstrative purposes, as you've done), but it makes it a bit easier for us when a representative sample is something we just highlight, copy, paste, and play with. Second, your description is unclear; *"0" after a string* is fine but what string? Third, what have you tried? This might be a perfect fit for regular expressions, but it would help if you show your level of effort. (This isn't a write-my-code service.) Thanks! — r2evans, Dec 08 '17 at 22:32
Some references for how to make a great reproducible question: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://stackoverflow.com/help/mcve. — r2evans, Dec 08 '17 at 22:32

score 1 · Answer 1 · answered Dec 07 '17 at 19:21

If the filename should not be part of the match, one can utilize the filename column to avoid regex from matching:

library(dplyr)
library(purrr)

df %>%
  mutate(no_filename = map2_chr(filename, text, ~gsub(.x, '', .y))) %>%
  filter(grepl("^0", no_filename)) %>%
  select(-no_filename)

Result:

  filename      text     value
1       S2  S20XXXXX 0.2065314
2       S3  S30XXXXX 0.8146400
3       S4  S40XXXXX 0.8123895
4       S6  S60XXXXX 0.1111354
5       S7  S70XXXXX 0.1028646
6       S9  S90XXXXX 0.1306957
7       S9  S90XXXXX 0.3203732
8      S10 S100XXXXX 0.1876911

Note:

Notice that S100XXXXX is matched, but not S101XXXXX

Data:

library(dplyr)
df = data.frame(filename = rep(paste0('S', 1:10), each = 5))
set.seed(123)
df = df %>%
  mutate(text = paste0(filename, sample(c(0:5), 50, replace = TRUE), 
                     paste(rep('X', 5), collapse = "")),
         value = runif(50))

Good solution - does exactly what appears to be requested. The only issue would OP's data- which may me more nuanced that the simulated data used here - voted up! — aiatay7n, Mar 25 '20 at 22:04

score 0 · Answer 2 · answered Dec 07 '17 at 18:16

0

Use:

grep('0',resultado2$TextGridLabel)

to find the rows with a 0. If you want to see the whole dataset subsetted by your search parameter, just use brackets:

resultado2[grep('0',resultado2$TextGridLabel),]

answered Dec 07 '17 at 18:16

stevebroll

175
1
8

This wouldn't work correctly if there is an `S10XXXXXXX` in `TextGridLabel` – acylam Dec 07 '17 at 18:24

score 0 · Answer 3 · answered Dec 07 '17 at 18:17

0

You should read the help returned by ?regex. It will give you a summary of using regular expressions in R. It will also refer you to a variety of functions that you can use the regex with.

For example, if your data above was in a dataframe, df:

grep(x=df$TextGridLabel, pattern="^.*0.+$")

would return an index of all the values starting with anything, containing a 0 and having at least one character after the 0.

Cheers!

answered Dec 07 '17 at 18:17

Nate

364
1
5

This wouldn't work correctly if there is an `S10XXXXXXX` in `TextGridLabel` – acylam Dec 07 '17 at 18:25
1

@useR I'm not an expert at using regex... but I can't reproduce your claim. Here's what I get when I try to see what happens: `> a <-c("S10XXXXXXXX","D10W","D111America")` `> grep(a,pattern="^.*0.+$")` `[1] 1 2`... This is what i would expect and I believe what was asked for. – Nate Dec 07 '17 at 18:36
I guess since OP's question is not entirely clear, there are different interpretations of what he really wanted. I would think that `"S10XXXXXXXX"` shouldn't be matched since there is no `0` after `S10`, where `S10` is a filename. If `S10` is _not_ the string that OP is referring to in "_the number "0" after a string_", then I agree your solution works as intended, but we won't know for sure until OP clarifies. – acylam Dec 07 '17 at 18:41
1

@useR Ah, yes, you are correct of course. This can all get very hairy if the strings don't have a structure conducive to identifying substrings or at least a clearer question/set of requirements as you point out. Matt Addison - If useR's interpretation is correct, we'd have to know if the beginning substring always starts S\d{3} or if it can vary and how to correctly identify it before we could exclude 0s in the beginning substring. – Nate Dec 07 '17 at 18:48

score 0 · Answer 4 · answered Dec 07 '17 at 19:46

The code below does work and it is basically what I need.

resultado2[grep('0',resultado2$TextGridLabel),]

However, I would like to avoid data such as S10XXXXXXX in TextGridLabel. I have just edited the original post so that the new picture illustrates better what I'd like to consider.

How do I identify a string using a wildcard using R?

4 Answers4