0

I have this following string

</p><drupal-entity data-view-mode="oembed_display"></drupal-entity><p><strong>Designer Crush:</strong></p><drupal-entity data-view-mode="oembed_display"></drupal-entity><p>&nbsp;</p>

pattern is

<drupal-entity((?!<drupal-entity).)*?><\/drupal-entity><p>&nbsp;<\/p>

which is matching fine, but having some extra character at as another group " and i just want this string to be matched

<drupal-entity data-view-mode="oembed_display"></drupal-entity><p>&nbsp;</p>

Ex https://regex101.com/r/Aeqxxy/1

mks
  • 351
  • 1
  • 3
  • 16

2 Answers2

1

If there are no tags within the drupal-entity tags, then my pattern will provide a highly efficient match. However, my pattern cannot be trusted if there is so much as a possibility of a > between the drupal tags.

So I am saying, my pattern is compromise on accuracy for speed, but I am only making this compromise because the sample input text doesn't show an inner tags -- so I am running with an assumption.

Pattern: (Demo)

@<drupal-entity[^>]+></drupal-entity><p>&nbsp;</p>@
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
1

Your question is about how to avoid additional items in the match array. Note that these items with IDs from 1 and up are added whenever a pattern contains a capturing group. Capturing groups are made with a pair of unescaped parentheses.

Your pattern contains ((?!<drupal-entity).)*? tempered greedy token where you used (...) thus creating a capturing group. This pattern matches any char (.) and captures it into Group 1 (that creates an additional item) that is not a starting point for a <drupal-entity substring, and matches 0+ of such chars, as few as possible (thus, it is not the same pattern @mickmackusa suggests, it will match up to the first ></drupal-entity><p>&nbsp;</p> substring. It means it won't handle nested tags, just be warned.

If possible, use an HTML parser.

To solve the issue, you need to replace the capturing group with a non-capturing one, and it will already yield the desired results: (?:(?!<drupal-entity).)*? (note the ?: added after the initial ().

Graham
  • 7,431
  • 18
  • 59
  • 84
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I would just like to offer that using a negated character class to match the remaining portion of the opening ` – mickmackusa Jul 19 '17 at 11:11
  • @mickmackusa If we are talking optimizations, you may even consider [`]*>[^<]*(?:<(?!drupal-entity)[^<]*+)*?

     

    `](https://regex101.com/r/Aeqxxy/6).
    – Wiktor Stribiżew Jul 19 '17 at 11:15