0

This may be a very simple question but I have not much experience with regex expressions. This page is a good source of regex expressions but could not figure out how to include them into my following code:

data %>% filter(grepl("^A01H1", icl))

Question

I would like to extract the values in one column of my data frame starting with this A01H1 up to 2 more digits, for example A01H100, A01H140, A01H110. I could not find a solution despite my few attempts:

Attempts

I looked at this question from which I used ^A01H1[0-9].{2} to select up tot two more digits.

I tried with adding any character ^A01H1[0-9][0-9][x-y] to stop after two digits.

Any help would be much appreciated :)

Amleto
  • 584
  • 1
  • 7
  • 25
  • 1
    `^A01H1.*?\d{2}` or `^A01H1\d{1,2}` or `^A01H1\d{1,2}(?!\d)`? Do you expect any text between `A01H1` and two digits? Do you expect any text after the 2 digits? – Wiktor Stribiżew Sep 10 '19 at 11:38
  • 2
    `"^A01H1\\d{1,2}$"`? Please provide a short example with values that should be matched and should not be matched ;) – kath Sep 10 '19 at 11:39
  • this `"^A01H1\\d{1,2}$"` works very well, could you explain this further?? unfortunately these `^A01H1.*?\d{2} `or `^A01H1\d{1,2}` or `^A01H1\d{1,2}(?!\d)` do not work. – Amleto Sep 10 '19 at 12:01
  • Did you try them properly? `grepl("^A01H1\\d{1,2}(?!\\d)", icl, perl=TRUE)` – Wiktor Stribiżew Sep 10 '19 at 12:02

2 Answers2

1

You can use "^A01H1\\d{1,2}$". The first part ("^A01H1"), you figured out yourself, so what are we doing in the second part ("\\d{1,2}$")?

  • \d includes all digits and is equivalent to [0-9], since we are working in R you need to escape \ and thus we use \\d
  • {1,2} indicates we want to have 1 or 2 matches of \\d
  • $ specifies the end of the string, so nothing should come afterwards and this prevents to match more than 2 digits
kath
  • 7,624
  • 17
  • 32
1

It looks as if you want to match a part of a string that starts with A01H1, then contains 1 or 2 digits and then is not followed with any digit.

You may use

^A01H1\d{1,2}(?!\d)

See the regex demo. If there can be no text after two digits at all, replace (?!\d) with $.

Details

  • ^ - start of strinmg
  • A01H1 - literal string
  • \d{1,2} - one to two digits
  • (?!\d) - no digit allowed immediately to the right
  • $ - end of string

In R, you could use it like

grepl("^A01H1\\d{1,2}(?!\\d)", icl, perl=TRUE)

Or, with the string end anchor,

grepl("^A01H1\\d{1,2}$", icl)

Note the perl=TRUE is only necessary when using PCRE specific syntax like (?!\d), a negative lookahead.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563