Regex to match given text

Question

I have this following string

</p><drupal-entity data-view-mode="oembed_display"></drupal-entity><p><strong>Designer Crush:</strong></p><drupal-entity data-view-mode="oembed_display"></drupal-entity><p>&nbsp;</p>

pattern is

<drupal-entity((?!<drupal-entity).)*?><\/drupal-entity><p>&nbsp;<\/p>

which is matching fine, but having some extra character at as another group " and i just want this string to be matched

<drupal-entity data-view-mode="oembed_display"></drupal-entity><p>&nbsp;</p>

Ex https://regex101.com/r/Aeqxxy/1

So the regex you're using is `<\/drupal-entity>
<\/p>`, and what you want it to produce is `

`, but what is it that you're currently getting back as output instead? Can you post the exact incorrect output you're seeing now? — anandsun, Jul 18 '17 at 13:20
Have you got a bracket in the wrong place?? drupal-entity).*?) — Mr Mystery Guest, Jul 18 '17 at 13:23
If you just don't want to capture the `"` then do as Wiktor has suggested and add `?:` just inside the capture group. If this is not what you want, please clarify your question. ...and ping Wiktor after you do so that he can post an answer. — mickmackusa, Jul 19 '17 at 02:05
@WiktorStribiżew you solution worked for me, so if you can post ur answer, i can accept. Thanks. — mks, Jul 19 '17 at 05:40
@mickmackusa: It is true that negated character classes work faster than lookaround-based tempered greedy token, but they match rather different strings. — Wiktor Stribiżew, Jul 19 '17 at 06:24
@mickmackusa: See [OP regex](https://regex101.com/r/Aeqxxy/3) vs. [your regex](https://regex101.com/r/Aeqxxy/4). — Wiktor Stribiżew, Jul 19 '17 at 06:52

mickmackusa · Answer 1 · 2017-07-19T11:15:34.260

If there are no tags within the drupal-entity tags, then my pattern will provide a highly efficient match. However, my pattern cannot be trusted if there is so much as a possibility of a > between the drupal tags.

So I am saying, my pattern is compromise on accuracy for speed, but I am only making this compromise because the sample input text doesn't show an inner tags -- so I am running with an assumption.

Pattern: (Demo)

@<drupal-entity[^>]+></drupal-entity><p>&nbsp;</p>@

score 1 · Answer 2 · edited Sep 23 '17 at 23:12

Your question is about how to avoid additional items in the match array. Note that these items with IDs from 1 and up are added whenever a pattern contains a capturing group. Capturing groups are made with a pair of unescaped parentheses.

Your pattern contains ((?!<drupal-entity).)*? tempered greedy token where you used (...) thus creating a capturing group. This pattern matches any char (.) and captures it into Group 1 (that creates an additional item) that is not a starting point for a <drupal-entity substring, and matches 0+ of such chars, as few as possible (thus, it is not the same pattern @mickmackusa suggests, it will match up to the first ></drupal-entity><p> </p> substring. It means it won't handle nested tags, just be warned.

If possible, use an HTML parser.

To solve the issue, you need to replace the capturing group with a non-capturing one, and it will already yield the desired results: (?:(?!<drupal-entity).)*? (note the ?: added after the initial ().

I would just like to offer that using a negated character class to match the remaining portion of the opening ` — mickmackusa, Jul 19 '17 at 11:11
@mickmackusa If we are talking optimizations, you may even consider [`]*>[^<]*(?:<(?!drupal-entity)[^<]*+)*?

`](https://regex101.com/r/Aeqxxy/6). — Wiktor Stribiżew, Jul 19 '17 at 11:15

Regex to match given text

2 Answers2